- Weka: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
- LibSVM: LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification.
- UJMP: The Universal Java Matrix Package (UJMP) is an open source Java library that provides sparse and dense matrix classes, as well as a large number of calculations for linear algebra like matrix multiplication or matrix inverse. Operations such as mean, correlation, standard deviation, replacement of missing values or the calculation of mutual information are supported also.
- JDMP: The Java Data Mining Package (JDMP) is an open source Java library for data analysis and machine learning. It facilitates the access to data sources and machine learning algorithms (e.g. clustering, regression, classification, graphical models, optimization) and provides visualization modules. It includes a matrix library for storing and processing any kind of data, with the ability to handle very large matrices even when they do not fit into memory
Journals & Books
- Christopher M. Bishop (2006), Pattern Recognition and Machine Learning, Springer.
- Richard O. Duda et. al (2001), Pattern Classification (2nd ed) , Wiley.
- Tom Mitchell (1997), Machine Learning, McGraw Hill.
- Ethem Alpaydin(2004), Introduction to Machine Learning, MIT Press.
Semi-Supervised Kernel Learning Methods
Zhu, X., Kandola, J., Ghahramani, Z., Lafferty, J., (2005) Nonparametric transforms of graph kernels for semi-supervised learning, Advances in Neural Information Processing Systems. [original link – pdf]
Kapoor, A., Qi, Y., Ahn, H., Picard, R., (2006) Hyperparameter and kernel learning for graph based semi-supervised classification, Advances in Neural Information Processing Systems. [original link – pdf]
Neighborhood Graph Construction for Manifold Learning
Cheng, B., Yang, J., Yan, S., Fu, Y., Huang, T. S., (2010) Learning with l1-Graph for Image Analysis, IEEE Transactions on Image Processing. [original link – pdf]
Dhillon, P. S., Talukdar, P. P., Crammer, K., (2010) Inference Driven Metric Learning for Graph Construction, 4th North East Student Colloquium on Artificial Intelligence. [original link – pdf]
Jebara, T., Wang, J., Chang, S. F., (2009) Graph Construction and b-Matching for Semi-Supervised Learning, International Conference on Machine Learning. [original link – pdf]
Shin, H., Hill, N. J., Rätsch, G., (2006) Graph based Semi-Supervised Learning with Sharper Edges, European Conference on Machine Learning. [original link – pdf]
Hein, M., Maier, M., (2006) Manifold Denoising, Advances in Neural Information Processing Systems. [original link – pdf]
Wang, J., Zhang, Z., Zha, H., (2005) Adaptive Manifold Learning, Advances in Neural Information Processing Systems. [original link – pdf]
Semi-Supervised Learning with Manifold Assumption
Belkin M., Niyogi P., Sindhwani, V., (2006) Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples, The Journal of Machine Learning Research. [original link – pdf]
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., Scholkopf, B., (2004) Learning with local and global consistency, Advances in Neural Information Processing Systems.
Zhu, X., Ghahramani, Z., Lafferty, J., (2003) Semi-supervised learning using Gaussian fields and harmonic functions, In Proceedings of the 20th International Conference on Machine Learning. [original link – pdf]
- Facial Expression Database of Sharif (FEDB): The DML Facial Expression Database with posed and evoked expressions and emotions is a video database containing face videos showing a number of subjects performing the six different basic emotions defined by Eckman and Friesen. The database has been developed in an attempt to assist researchers who investigate the effects of posed and evoked facial expressions in face images.
- UCI Machine Learning Repository: The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets.
- Caltech-101: Caltech 101 is a dataset of digital images created in September, 2003, compiled by Fei-Fei Li, Marco Andreetto, and Marc ‘Aurelio Ranzato at the California Institute of Technology. It is intended to facilitate Computer Vision research and techniques. It is most applicable to techniques interested in recognition, classification, and categorization.
- Caltech-256: Caltech 256 is another image dataset created at the California Institute of technology in 2007, a successor to Caltech 101. It is intended to address some of the weaknesses inherent to Caltech 101. Overall, it is a more difficult dataset than Caltech 101.
- GenBank: GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2008 Jan;36(Database issue):D25-30). There are approximately 106,533,156,756 bases in 108,431,692 sequence records in the traditional GenBank divisions and 148,165,117,763 bases in 48,443,067 sequence records in the WGS division as of August 2009.
- PlaceLab Datasets: The PlaceLab was first introduced in a a CHI 2005 paper. This dataset is freely available for academic researchers to use in their own work given that its website is cited whenever the dataset is used, as well this academic overview article.