The Role of Machine learning in Big Data with probabilistic Classification Models

Prof Rao Kotagiri
University of Melbourne, Australia
Abstract
Most of the Big Data is unstructured. The first step of Big Data Analystics is to extract structure and Semantics from the unstructured data
In this talk I will describe how Machine learning can extract structure unstructured data. This talk discusses how to build quality classifiers using probabilistic models. The talk starts with basic classification problem and introduces the simple Naive Bayes (NB) model. I then introduce Hidden Markov Model (HMM) as an extension of basic Naive Bayes Model. HMM and NB are generative methods and generally difficult to employ in problems with many features. We then introduce Maximum Entropy (ME) Classifer which gives theoretical foundation to introduce Graphical Models. ME is a discriminative method and hence can be widely used for problems with many feature values unlike generative methods. The talk then concludes with the presentation of a very important Graphical Model: Conditional Random Fields (CRFs). CRFs is a very powerful Classification Technique and works well in practice. These techniques are now widely used in text processing, Genomic data processing and Medical image processing. The talk relies on simple probability theory. I will present some experimental results using these methods.

Bio
Professor Ramamohanarao (Rao) Kotagiri received PhD from Monash University. He was awarded the Alexander von Humboldt Fellowship in 1983. He has been at the University Melbourne since 1980 and was appointed as a professor in computer science in 1989. Rao held several senior positions including Head of Computer Science and Software Engineering, Head of the School of Electrical Engineering and Computer Science at the University of Melbourne and Research Director for the Cooperative Research Centre for Intelligent Decision Systems. He served on the Editorial Boards of the Computer Journal, Journal of Statistical Analysis and Data Mining, IEEETKDE and VLDB (Very Large Data Bases). At present he is on the Editorial Boards for Universal Computer Science, Transactions on Data Privacy and IEEETCC. He was the program Co-Chair for VLDB, PAKDD, DASFAA and DOOD conferences. He is a steering committee member of IEEE ICDM, PAKDD and DASFAA. He received PAKDD distinguished contribution award for Data Mining. Rao is a Fellow of the Institute of Engineers Australia, a Fellow of Australian Academy Technological Sciences and Engineering and a Fellow of Australian Academy of Science. He was awarded Distinguished Contribution Award in 2009 by the Computing Research and Education Association of Australasia.