Online Feature Selection of Big Data Sets
Online Machine Learning (OML) seeks to design algorithms that can learn from data that arrives in a streaming fashion similar to how humans and animals learn in the wild. Online Feature Streaming Selection (OSFS) is a sub-field of OML that assumes that all the data samples are available at run-time but the features for every sample arrive in a streaming fashion. In this project, we propose Geometric Online Adaption approach for OSFS problems, a new algorithm that utilizes a recently proposed graph-based geometric dependency measure. In this project, we consider a new setting called Online Streaming Feature Selection with Streaming Samples (OSFS-SS) with an updated class label space, where both the features and the samples are simultaneously streamed. This problem setting, which is validated in many real-world problems, allows for training high performance models with low computational and storage requirements. In our lab we are developing novel algorithms, that has applications in both the OSFS and OSFS-SS settings toward goals like higher classification performance achievement while maintaining smaller feature subsets of streaming big data. These methods are used in several areas such as visualizing high-level representations learned by deep neural nets, computer vision, online videos such as Amazon prime or Astronomical Videos classification, and medical research like data-driven cancer identification and online doctor-patient interactions.