2013 Big Data School

Venue: Level 7, Building 10, University of Technology, Sydney
Jones Street, Ultimo, NSW 2007 Australia
Organizers: Jinyan Li, Longbing Cao, Xinhua Zhu, and Colin Wise (Advanced Analytics Institute, University of Technology, Sydney)
Introduction: Big Data School of is aimed to provide an educational platform for post-graduate students and young researchers and data engineers, data technicians and Data Scientists in Industry in the areas of data mining, machine learning and analytics. It is a 2-day intensive course, hosted in Sydney from 10-11, April, 2013 immediately prior to the opening of the main conference in Gold Coast. The program of this Big Data School covers various teaching modules (2-3 hours each) on April 10 and 11. A related discussion forum Big Data Summit is followed on April 12.
 Faculty:  FEIT
 School/Unit:  School of Software – Advanced Analytics Institute (AAI)
 Contact Email:   Colin.Wise@uts.edu.au


  • Professor Longbing Cao UTS Advanced Analytics Institute (AAI)

  • Limsoon Wong National University of Singapore

  • Ross Farrelly Teradata Australia & New Zealand

  • Kai Ming Ting Monash University.

  • Richard Xu University of Technology, Sydney (UTS)

  • Geoff Webb Monash University

  • Venky Krishnan Adobe

  • Jaideep Srivastava University of Minnesota

  • Professor George Karypis University of Minnesota

  AGENDA DAY ONE: Wednesday 10th April
 9:00am – 9:15am Opening address by Professor Longbing Cao, Director, UTS Advanced Analytics Institute (AAI) 
 9.15am – 10.15am
Large Scale Biomedical and Healthcare Data Mining and Applications Part I
by Professor Limsoon Wong
Limsoon Wong is a provost's chair professor of computer science and a professor of pathology at the National University of Singapore. He currently works mostly on knowledge discovery technologies and their application to biomedicine.
Prior to that, he has done significant research in database query language theory and finite model theory, as well as significant development work in broad-scale data integration systems. Limsoon has written about 150 research papers, a few of which are among the best cited of their respective fields.
He serves/d on the editorial boards of Information Systems, Journal of Bioinformatics and Computational Biology, Bioinformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Drug Discovery Today, and Journal of Biomedical Semantics. He co-founded and is chairman of Molecular Connections, a provider of data curation services employing over 700 curators, analysts, and engineers.
 10:15am – 10:30am Morning tea break
 10:30am – 11:45am
Large Scale Biomedical and Healthcare Data Mining and Applications Part II 
by Professor Limsoon Wong
 11.45am – 12.30pm
Success Stories in Data Mining and Machine Learning: Ross Farrelly, Chief Data Scientist, Teradata Australia & New Zealand
Ross Farrelly is the Chief Data Scientist for Teradata Aster, the market leader in
big data analytics, and ultra-fast analysis of unstructured, semi structured and
multi structured data.
At Teradata Aster, Ross is responsible for data mining, analytics and advanced
modeling projects using the Teradata Aster platform. Previously Ross ran
Datamilk, an independent bespoke data mining consultancy specialising in data mining and advanced predictive analytics. Ross is a six sigma black belt and has had many years of experience in a variety of statistical roles including Business
Development Management at Minitab and as a SAS Analyst at New Frontier
Publishing. Ross has a Master of Applied Statistics and a first class honors degree in Pure Mathematics. He has a keen interest in a number of data mining techniques especially social network analysis and random forests.
This lecture will cover case studies from five big data projects conducted in South East Asia during 2012. It will outline the business problem, the machine learning technique used to address the problem and the implementation of the solution. A high level view of the software, hardware and skill set used during the projects will also be given. The lecture will draw on case studies from the banking and finance, telecommunications and media sectors.
In his free time Ross regularly competes on Kaggle, an online forum where data
scientists match their skills against their global peers including experts in statistics, mathematics, and machine learning. Ross is also an active member of the Sydney users of R Forum (SURF).
 12:30pm – 1:30pm
Lunch and Industrial Demo & Exhibition
 1:30pm – 3:10pm
Mining Big Data: the state-of-the-art and beyond 
by Kai-Ming Ting
Kai Ming Ting is an Associate Professor in the Faculty of Information Technology at Monash University, and currently serves as the Associate Dean Research Training in the Faculty of Information Technology. He had previously held academic positions at Waikato University and Deakin University, and visiting positions at Osaka University, Japan, Nanjing University, China, and Chinese University of Hong Kong. His research projects have been supported by grants from Australian Research Council, US Air Force of Scientific Research (AFOSR/AOARD), Australian Institute of Sport, and Toyota InfoTechnology Center (Japan). Awards received include the Runner-up Best Paper Award in 2008 IEEE ICDM, and the Best Paper Award in 2006 PAKDD. He received his PhD from Sydney University.
He is the creator of a new paradigm in data mining called mass estimation. Density estimation is the current paradigm on which most existing data mining algorithms are based. The unique feature of mass estimation is that it has constant time and space complexities, ideal for solving problems with big data.
His research interests are in the areas of mass estimation and mass-based approaches, ensemble approaches and data stream data mining. He is an associate editor for Journal of Data Mining and Knowledge Discovery. He had co-chaired the Pacific-Asia Conference on Knowledge Discovery and Data Mining 2008. He had served as a member of program committees for a number of international conferences including ACM SIGKDD, IEEE ICDM and ICML.
 3.10pm – 3:30pm
Afternoon tea break
 3:30pm – 5:10pm
Monte-carlo sampling 
By Richard Xu
Dr Richard Xu is a senior lecturer working at the School of Computing and Communications at University of Technology, Sydney (UTS). His research field include machine learning, image processing and computer vision. He is particularly interested in models relates to non-parametric Bayes.
Monte-carlo sampling (or simply sampling) is an essential inference tool for many machine learning models. In this lecture, we will go over very gently some of the common sampling methods with the aid of MATLAB simulations. I will also show how they are being applied to more complicated settings.
End of Day One
  AGENDA DAY TWO: Thursday 11th April
 9:00am – 10:20am Fundamental and Advanced Machine Learning Methods and Big Data Applications Part I
by Professor Geoff Webb
Geoff Webb is a Professor of Information Technology Research in the Faculty of Information Technology at Monash University, where he heads the Centre for Research in Intelligent Systems. Prior to Monash he held appointments at Griffith University and then Deakin University, where he received a personal chair. His primary research areas are machine learning, data mining, and user modelling. He is known for his contribution to the debate about the application of Occam's razor in machine learning and for the development of numerous methods, algorithms and techniques for machine learning, data mining and user modelling.
His commercial data mining software, Magnum Opus, incorporates many techniques from his association discovery research. Many of his learning algorithms are included in the widely-used Weka machine learning workbench.
He is editor-in-chief of Data Mining and Knowledge Discovery, co-editor of the Springer Encyclopedia of Machine Learning, a member of the advisory board of Statistical Analysis and Data Mining and a member of the editorial boards of Machine Learning and ACM Transactions on Knowledge Discovery from Data. He was co-PC Chair of the 2010 IEEE International Conference on Data Mining and co-General Chair of the 2012 IEEE International Conference on Data Mining.
 10:20am – 10:50am    Morning tea break
 10:50am – 11:45am
Fundamental and Advanced Machine Learning Methods and Big Data Applications Part II   
by Professor Geoff Webb
 11.45am – 12.30pm
Success Stories and Experiences in Data Mining and Machine Learning
by Venky Krishnan  Adobe Account Executive – Digital Marketing,  Education
Venky Krishnan is the Digital Marketing Sales Lead for Education in ANZ at Adobe Systems based out of Sydney. In this role he focused on helping educational institutions focus on the value of leveraging big data into actionable insights. Venky focuses on encouraging Educational marketers to engage students, faculty and alumni everywhere across mobile and social platforms.
Venky comes from a background of working for Microsoft, Deloitte, Sungard in sales, product development, and operations with a variety of industries over the last 20 years and is passionate about education analytics.
 12:30pm – 1:30pm Lunch and Industrial Demo & Exhibition
 1:30pm – 3:10pm
Social Media Big Data Analytics Part I
by Professor Jaideep Srivastava
Jaideep Srivastava is Professor of Computer Science & Engineering at the University of Minnesota, where he directs a laboratory focusing on research in Web Mining, Social Media Analytics, and Health Analytics. He has authored over 300 papers, and supervised 30 PhD dissertations and 59 MS theses. He is currently co-leading a multi-institutional, multi-disciplinary project in the rapidly emerging area of social computing (http://vwobservatory.com/).
His research has been supported by government agencies, including NSF, NASA, ARDA, DARPA, IARPA, NIH, CDC, US Army, US Air Force, and MNDoT; and industries, including IBM, United Technologies, Eaton, Honeywell, Cargill, and Huawei Telecom. He has an active collaboration with Allina's Center for Healthcare Innovation (http://www.allina.com/ahs/aboutallina.nsf/page/health_care_innovation), where he is a Distinguished Fellow. Dr. Srivastava has significant experience in the industry, in both consulting and executive roles. He has led a data mining team at Amazon.com (www.amazon.com), built a data analytics department at Yodlee (www.yodlee.com), and served as the Chief Technology Officer for Persistent systems (www.persistentsys.com).
He has provided technology and strategy advice to Cargill, United Technologies, IBM, Honeywell, KPMG, 3M, TCS, and Eaton, and has served as Advisor to the State Government of Minnesota, the State Government of Maharashtra, and is presently technology adviser to the UID project of the Government of India. He has held distinguished professorships at Heilongjiang University and Wuhan University, China. Dr. Srivastava has BTech from the Indian Institute of Technology (IIT), Kanpur, India, and MS and PhD from University of California, Berkeley.
He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), and has been an IEEE Distinguished Visitor. He has given over 150 invited talks in over 30 countries, including more than a dozen keynote addresses at major international conferences. Dr. Srivastava is the Co-Founder and CTO of Ninja Metrics (www.ninjametrics.com), which brings his research in social analytics to the commercial world.
Afternoon tea break
 3:30pm – 4:10pm
Social Media Big Data Analytics Part II
by Professor Jaideep Srivastava
 4:10pm – 5:10pm
Top-N Recommender Systems: Revisiting Item Neighborhood Methods
By George Karypis
George Karypis is a professor at the Department of Computer Science & Engineering at the University of Minnesota, Twin Cities. His research interests spans the areas of data mining, bioinformatics, cheminformatics, high performance computing, information retrieval, collaborative filtering, and scientific computing. His research has resulted in the development of software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph partitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering-based recommendation algorithms (SUGGEST), clustering high dimensional datasets (CLUTO), finding frequent patterns in diverse datasets (PAFI), and for protein secondary structure prediction (YASSPP). He has coauthored over 200 papers on these topics and a book title “Introduction to Parallel Computing” (Publ. Addison Wesley, 2003, 2nd edition). In addition, he is serving on the program committees of many conferences and workshops on these topics, and on the editorial boards of the IEEE Transactions on Knowledge and Data Engineering, Social Network Analysis and Data Mining Journal, International Journal of Data Mining and Bioinformatics, the journal on Current Proteomics, Advances in Bioinformatics, and Biomedicine and Biotechnology.
 5:10pm    End of Day Two, Close of Event