Confirmed keynote speeches include:
| John Sheridan
Australian Government Chief Technology Officer & Procurement Coordinator, Australian Government
|Title: It doesn’t have to be that big: Data based decision making at the coalface|
John Sheridan, Australian Government CTO and Procurement Coordinator, will discuss the use of data to drive the effective and efficient use of taxpayers’ funds in the provision of whole of government ICT and non-ICT goods and services.
| Bio for John Sheridan:
John Sheridan joined the APS Senior Executive Service in 2002, after 22 years in the Australian Army and three in the Defence Science and Technology Organisation. In Defence, he was the lead IT architect and, later, responsible for the design and development of IT systems.
In the Australian Government Information Management Office, John negotiated the Microsoft Volume Sourcing Agreement, managed major ICT budget reductions, and led whole of government matters such as australia.gov.au, data centre strategy, coordinated procurement, industry engagement, and telecommunications services.
In February 2013, John became the first Australian Government Chief Technology Officer. In addition to his responsibilities for ICT, coordinated and general procurement policy and operations, John assists the business community as the Australian Government Procurement Coordinator.
| Rajeev Rastogi
Director of Machine Learning, Amazon
|Title: Machine Learning@Amazon|
In this talk, I will first provide an overview of the key Machine Learning (ML) applications we are developing at Amazon. I will then describe a matrix factorization model that we have developed for making product recommendations – the salient characteristics of the model are:
(1) It uses a Bayesian approach to handle data sparsity,
(2) It leverages user and item features to handle the cold start problem, and
(3) It introduces latent variables to handle multiple personas associated with a user account (e.g. family members).
Our experimental results with synthetic and real-life datasets show that leveraging user and item features, and incorporating user personas enables our model to provide lower RMSE and perplexity compared to baselines.
| Bio for Rajeev Rastogi:
Rajeev Rastogi is the Director of Machine Learning at Amazon. Previously, he was the Vice President of Yahoo! Labs Bangalore, and the founding Director of the Bell Labs Research Center in Bangalore. Rajeev is active in the fields of databases, data mining, and networking, and has served on the program committees of several conferences in these areas. He currently serves on the editorial board of the CACM, and has been an Associate editor for IEEE Transactions on Knowledge and Data Engineering in the past. He has published over 125 papers, and holds over 50 patents. Rajeev is an ACM Fellow and a Bell Labs Fellow. He received his B. Tech degree from IIT Bombay, and a PhD degree in Computer Science from the University of Texas, Austin.
| Paul Toohey
Global Portfolio Director, Advanced Analytics, HP
|Title: Analytics in Australia – The gaps and opportunities for business|
| Bio for Paul Toohey:
Paul Toohey is the global portfolio director for Advanced Analytics Services. Toohey is responsible for the commercial release of new analytic solutions that allow clients to exploit value from their own and externally available big data sets.
Toohey delivers services that enable the enterprise to derive business value and insights from Big Data, improving customer engagement, operational efficiency and speed of response to market opportunities.
Prior to this, Toohey was General Manager for Strategic Enterprise Services for the APJ region where oversaw the sales and delivery of HP’s cloud, security and analytics portfolio for enterprise clients.
Toohey has more than 20 years’ experience in successfully launching new businesses including online banks, cloud and innovative analytics offerings. Toohey has also held executive positions in sales and account management whilst based in China, Australia and the UK.
| Sang Kyun Cha
Director of SNU Big Data Institute, Seoul National University, South Korea
|Title: How Can In-Memory Database Paradigm Revolutionize Real-Time Data Mining and Learning?
Data mining and learning by nature involves access to huge volumes of data from a variety of sources. With traditional enterprise architecture, much of data is resident in the disks or SSDs of disk-based database systems. Mining and learning processes either have to copy needed data sets on the fly from underlying database systems to their own address space or have to maintain their own data sets once they are copied. The first suffers from long run-time data transfer latency while the second faces the overhead of managing data synchronization complexity.
With the availability of multi-terabyte DRAMs in a single-blade server, past five years saw rapid adoption of a new generation of in-memory database technology in industry. Since SAP released SAP HANA in-memory big data platform in 2011, all major database vendors are supporting or plan to support a combination of in-memory row and column storages. In-memory row storage is ideal for fast ingestion of high-velocity data streams, either transactional or non-transactional. In-memory column storage, with memory footprint compressed by sorted dictionary encoding, is ideal for real-time analytics over a huge volume of relatively static data.
This new paradigm enables running data-intensive mining and learning computation inside data management processes (a.k.a. DBMS). With data copy minimized by using references to in-memory data during the computation, the performance of data mining and learning are improved by orders of magnitude. This paradigm also offers the new basis of implementing real-time mining and learning in tight coupling with data management. The question remains regarding how we architect the next-generation in-memory database platform to allow various data mining and learning algorithms to be implemented by end consumers of platform, not by platform vendors, without compromising the performance benefit and the stability of the platform.
| Bio for Sang Kyun Cha:
Dr. Sang Kyun Cha is a professor and an entrepreneur. He worked on three generations of in-memory database systems since he joined Seoul National University in 1992. In 2000, with the vision of coming in-memory enterprise database era, he founded Transact In Memory, Inc. and started developing his second-generation system P*TIME (Parallel* Transact-In-Memory Engine). The company was quietly acquired by SAP in 2005 and transformed to SAP Labs Korea. By early 2006, P*TIME development was completed with innovative in-memory OLTP features such as in-memory MVCC and optimistic latch-free index concurrency control, several years ahead of major DBMS vendors. To demonstrate its extreme OLTP scalability in tight integration with SAP’s applications, P*TIME also implemented a seamless two-tier API resilient to application crash in addition to three-tier API.
With SAP’s column store TREX, P*TIME became a corner stone of building SAP HANA, the first transactional distributed in-memory database platform which became generally available in June 2011. Today, SAP and numerous companies run ERP, CRM, and real-time analytics on HANA. By SAP’s request, Prof. Cha took the co-responsibility of developing HANA with German colleagues and saw its worldwide adoption.
In April 2014, Prof. Cha launched Seoul National University’s Big Data Institute to lead trans-disciplinary big data research involving almost all academic disciplines including computer science, engineering, natural and social science, and medicine. He serves on the board of trustees of Seoul National University. He has been on the board of Korea Telecom since March 2012, and is providing strategic advice to central and local governments on big data and software industry issues. Prof. Cha received his BS and MS from Seoul National University and his Ph.D. from Stanford University. He is on the editorial board of VLDB Journal since 2009 and was elected as a member of IEEE ICDE Steering committee in 2015.