• Weka: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
  • LibSVM: LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification.
  • UJMP: The Universal Java Matrix Package (UJMP) is an open source Java library that provides sparse and dense matrix classes, as well as a large number of calculations for linear algebra like matrix multiplication or matrix inverse. Operations such as mean, correlation, standard deviation, replacement of missing values or the calculation of mutual information are supported also.
  • JDMP: The Java Data Mining Package (JDMP) is an open source Java library for data analysis and machine learning. It facilitates the access to data sources and machine learning algorithms (e.g. clustering, regression, classification, graphical models, optimization) and provides visualization modules. It includes a matrix library for storing and processing any kind of data, with the ability to handle very large matrices even when they do not fit into memory

Journals & Books


Interesting Papers

Semi-Supervised Kernel Learning Methods

Zhu, X., Kandola, J., Ghahramani, Z., Lafferty, J., (2005) Nonparametric transforms of graph kernels for semi-supervised learningAdvances in Neural Information Processing Systems. [original link – pdf]

Lu, Z., Jain, P., Dhillon, I. S., (2009) Geometry-aware metric LearningInternational Conference on Machine Learning. [original link – pdf]

Kapoor, A., Qi, Y., Ahn, H., Picard, R., (2006) Hyperparameter and kernel learning for graph based semi-supervised classificationAdvances in Neural Information Processing Systems. [original link – pdf]

Neighborhood Graph Construction for Manifold Learning

Cheng, B., Yang, J., Yan, S., Fu, Y., Huang, T. S., (2010) Learning with l1-Graph for Image AnalysisIEEE Transactions on Image Processing. [original link – pdf]

Dhillon, P. S., Talukdar, P. P., Crammer, K., (2010) Inference Driven Metric Learning for Graph Construction4th North East Student Colloquium on Artificial Intelligence. [original link – pdf]

Jebara, T., Wang, J., Chang, S. F., (2009)  Graph Construction and b-Matching for Semi-Supervised LearningInternational Conference on Machine Learning. [original link – pdf]

Shin, H., Hill, N. J., Rätsch, G., (2006) Graph based Semi-Supervised Learning with Sharper EdgesEuropean Conference on Machine Learning. [original link – pdf]

Hein, M., Maier, M., (2006) Manifold DenoisingAdvances in Neural Information Processing Systems. [original link – pdf]

Wang, J., Zhang, Z., Zha, H., (2005) Adaptive Manifold LearningAdvances in Neural Information Processing Systems. [original link – pdf]

Semi-Supervised Learning with Manifold Assumption

Belkin M., Niyogi P., Sindhwani, V., (2006) Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled ExamplesThe Journal of Machine Learning Research. [original link – pdf]

Zhou, D., Bousquet, O., Lal, T. N.,  Weston, J., Scholkopf, B., (2004) Learning with local and global consistencyAdvances in Neural Information Processing Systems.

Zhu, X., Ghahramani, Z., Lafferty, J., (2003) Semi-supervised learning using Gaussian fields and harmonic functionsIn Proceedings of the 20th International Conference on Machine Learning. [original link – pdf]


  • Facial Expression Database of Sharif (FEDB): The DML Facial Expression Database with posed and evoked expressions and emotions is a video database containing face videos showing a number of subjects performing the six different basic emotions defined by Eckman and Friesen. The database has been developed in an attempt to assist researchers who investigate the effects of posed and evoked facial expressions in face images.
  • UCI Machine Learning Repository: The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets.
  • Caltech-101: Caltech 101 is a dataset of digital images created in September, 2003, compiled by Fei-Fei Li, Marco Andreetto, and Marc ‘Aurelio Ranzato at the California Institute of Technology. It is intended to facilitate Computer Vision research and techniques. It is most applicable to techniques interested in recognition, classification, and categorization.
  • Caltech-256: Caltech 256 is another image dataset created at the California Institute of technology in 2007, a successor to Caltech 101. It is intended to address some of the weaknesses inherent to Caltech 101. Overall, it is a more difficult dataset than Caltech 101.
  • GenBank: GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2008 Jan;36(Database issue):D25-30). There are approximately 106,533,156,756 bases in 108,431,692 sequence records in the traditional GenBank divisions and 148,165,117,763 bases in 48,443,067 sequence records in the WGS division as of August 2009.
  • PlaceLab Datasets: The PlaceLab was first introduced in a a CHI 2005 paper. This dataset is freely available for academic researchers to use in their own work given that its website is cited whenever the dataset is used, as well this academic overview article.

Sharif University of Technology