IJCNN-2017 Tutorial

IJCNN-2017 Tutorial

May 14, 2017, Anchorage, Alaska, USA

Information Theoretic Learning

in Pattern Classification


IJCNN17_HuBG_4Slides.pdf (4 slides per page)

Bao-Gang Hu (胡包鋼)

National Laboratory of Pattern Recognition

Institute of Automation,

Chinese Academy of Sciences

Beijing, China


In this tutorial, I will start with my personal view on the basic problems in the study of machine learning. The problems can be considered as four modules connected in hierarchical and feedback structures as “What to learn?”, “How to learn?”, “What to evaluate?” and “What to adjust?”. The first issue, also called “learning target selection”, does not receive sufficient recognitions within our community if compared with the existing investigations on the subject of “feature selection”. The tutorial will present the “information theoretical learning” (also termed ITL by Principe, et al. 2000, 2010) in relation to the issues. The objective of the tutorial is to demonstrate that ITL will not only present a fundamental understanding to the learning target selection, but also lead to new classification tools in machine learning.

The tutorial will focus on pattern classification in the basis on ITL. I will introduce the novel theory of abstaining learning for both Bayesian classifiers and mutual information classifiers. Abstaining, or a reject option in classification, is one of the most important behaviors in real-life decision making from humans, which may significantly reduce total cost or risk in applications. Based on the theory, I will introduce the cost-free learning from the real-world data sets in comparison with cost-sensitive learning. The significance of the cost-free learning is demonstrated in the background of class-imbalance problems when costs are unknown for both errors and rejects.  The connections between the empirical measures and information measures are presented. The fundamental relations are upper bound and lower bound for both Bayesian error and non-Bayesian error with respect to conditional entropy in binary classifications. I will also demonstrate twenty-four information measures in the evaluation of abstaining binary classifications. The tutorial will show that information theory is also advanced from the study viewpoints of machine learning and pattern classifications.  

The tutorial is concluded by the further discussions on the emergences of abstaining learning and cost-free learning in the context of “big-dataclassifications.

Presenter/organizer biography

Dr. Bao-Gang Hu is currently a full Professor with NLPR (National Laboratory of Pattern Recognition), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He received his M.S. degree from the University of Science and Technology, Beijing, China in 1983, and his Ph.D. degree from McMaster University, Canada in 1993. From 2000 to 2005, he was the Chinese Director of LIAMA (the Chinese-FrenchJoint Laboratory supported by CAS and INRIA). His current research interests are pattern recognition and computer modeling.

Relevant publications by the presenter

  1. Hu, B.-G. and Wang, Y., “Evaluation criteria based on mutualInformation for classifications including rejected class,” Acta Automatica Sinica, vol. 34, pp. 1396-1403, 2008.

  2. Hu, B.-G., He, R., and Yuan, X.-T.,“Information-theoretic measures for objective evaluation of classifications,” Acta Automatica Sinica,vol. 38, pp. 1160–1173, 2012.

  3. Hu, B.-G., “What are the Differences between Bayesian Classifiers and Mutual-Information Classifiers?”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 25, pp.249-264, 2014.

  4. Zhang, X., and Hu, B.-G., “A new strategy of cost-free learning in the class imbalance problem”, IEEE Transactions on Knowledge and Data EngineeringVol.26, pp. 2872-2885, 2014.

  5. He, R., Hu, B.-G.,Yuan, X.-T., and Wang, L., Robust Recognition via Information Theoretic Learning, Springer, 2014.

  6. Hu, B.-G., “Information theory and its relation to machine learning”, January18, 2015. http://arxiv.org/abs/1501.04309

  7. Hu, B.-G.,and Xing, H.-J., An optimization approach of deriving bounds between entropy and error from joint distribution: Case study for binary classifications”, Entropy, Vol. 18, pp. 1-19, 2016.