Classifying A Stream Of Infinite Concepts

A Bayesian Non-Parametric Approach

Classifying streams of data, for instance financial transactions or emails, is an essential element in applications such as online advertising and spam or fraud detection. The data stream is often large or even unbounded; furthermore, the stream is in many instances non-stationary. Therefore, an adaptive approach is required that can manage concept drift in an online fashion. This paper presents a probabilistic non-parametric generative model for stream classification that can handle concept drift efficiently and adjust its complexity over time. Unlike recent methods, the proposed model handles concept drift by adapting data-concept association without unnecessary i.i.d. assumption among the data of a batch. This allows the model to efficiently classify data using fewer and simpler base classifiers. Moreover, an online algorithm for making inference on the proposed non-conjugate time-dependent non-parametric model is proposed. Extensive experimental results on several stream datasets demonstrate the effectiveness of the proposed model. The graphical representation of the proposed generative model is depicted below.

Related Publication:

  • Hosseini A., Rabiee H. R., Hafez H., and Soltani-Farani A., “Classifying A Stream Of Infinite Concepts: A Bayesian Non-Parametric Approach”, Euro. Conf. on Machine Learning and Practice  of Knowledge Discovery in Databases (ECMLPKDD), France, Nancy, 2014.

Project code:

  • This code is written for MATLAB and contains routines used in the above publication. Please contact first author in case of any problems.

People involved:

  • Abbas Hosseini, Hassan Hafez, Ali Soltani-Farani , Hamid R. Rabiee

Sharif University of Technology