Trends & Controversies

Special Session: Trends & Controversies in Data Science


Longbing Cao
Hiroshi Motoda
Hiroshi Motoda
George Karypis
George Karypis
Bart Goethals
Bart Goethals






Aims and scope

As an emerging area, data science is facing great opportunities as well as challenges.Often arguments exist: What is data science? Why data science? We have information science already, why do we need data science? Do we need analytics science? Is analytics new? What is the difference between statistics and data analytics? What makes a data scientist?

We believe that a special session on Trends and Controversy about data science and advanced analytics could bring insights from different mindsets for the healthy development of the science and society. Accordingly, this T&C special session will host talks by invitation to outline different views about today and future of data science. Invited speakers can contribute a paper (in the same format as the main conference submissions but could be less than 7 pages) to the special session, which will be handled by program co-chairs and accepted into the main conference proceeding probably by addressing comments from the program co-chairs.

Topic of interests

We expect insightful talks about forward-thinking, big thinking, original research, critical reflection and questioning on existing theories and tools, and/or innovative insights about data science, big data, advanced analytics.

Nominations to this T&C special sessions are welcome, please contact

Invited Speech 1 by Prof Philip S Yu

Prof Philip S Yu
University of Illinois at Chicago,
Title: Assessing the Longevity of Online Videos: a New Insight of a Video’s Quality

Recommending valuable videos to viewers is always crucial to an online video website and its related third parties. More particularly, what features and methods to be selected to assess the quality of online videos is still an on-going research topic. Unlike previous work attempted to evaluate a video only by its view count (a.k.a. popularity), this article proposes an additional scoring mechanism to capture a video’s long-lasting value (a.k.a. longevity) to assist the judgment of its quality. Generally speaking, a longevous video tends to be watched frequently and therefore is considered to be more valuable. We introduce the concept of latent social impulses and the necessity of using them while measuring a video’s longevity. When deriving latent social impulses, we view the video website as a digital signal filter and formulate the task as a least squares problem. The proposed longevity computation is based on the derived social impulses, and we use experiments to directly show that the computed longevity scores are able to capture overlooked information by popularity measure.

Bio for Prof Philip S Yu:

Philip S. Yu is a UIC Distinguished Professor and Wexler Chair in Information Technology at University of Illinois at Chicago. Philip S. Yu’s main research interests include data mining (especially on graph/network mining), social network, privacy preserving data publishing, data stream, database systems, and Internet applications and technologies. He is a Professor in the Department of Computer Science at UIC and also holds the Wexler Chair in Information and Technology. He spent most of his career at IBM Thomas J. Watson Research Center, where he was manager of the Software Tools and Techniques group. Dr. Yu has published more than 650 papers in refereed journals and conferences. He holds or has applied for more than 300 US patents.

Dr. Yu is a Fellow of the ACM and the IEEE. He is the Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data. He is on the steering committee of the IEEE Conference on Data Mining and ACM Conference on Information and Knowledge Management and was a member of the IEEE Data Engineering steering committee.

Invited Speech 2 by Prof Geoff Webb

Prof Geoff Webb
Monash University,
Title: Scalable learning of Bayesian network classifiers

I present our work on highly-scalable out-of-core techniques for learning well-calibrated Bayesian network classifiers. Our techniques are based on a novel hybrid generative and discriminative learning paradigm. These algorithms

  • provide straightforward mechanisms for managing the biasvariance tradeoff
  • have training time that is linear with respect to training set size,
  • require as few as one and at most four passes through the training data,
  • allow for incremental learning,
  • are embarrassingly parallelisable,
  • support anytime classification,
  • provide direct wellcalibrated prediction of class probabilities,
  • can learn using arbitrary loss functions,
  • support direct handling of missing values, and
  • exhibit robustness to noise in the training data.

Despite their computationally efficiency the new algorithms deliver classification accuracy that is competitive with stateoftheart incore discriminative learning techniques.

Bio for Prof Geoff Webb:

Geoff Webb is a Professor of Information Technology Research in the Faculty of Information Technology at Monash University, where he heads the Centre for Data Science. His primary research areas are machine learning, data mining, user modelling and computational structural biology. He is known for his contribution to the debate about the application of Occam’s razor in machine learning and for the development of numerous methods, algorithms and techniques for machine learning, data mining, user modelling and computational structural biology. His commercial data mining software, Magnum Opus, incorporates many techniques from his association discovery research. Many of his learning algorithms are included in the widely-used Weka machine learning workbench. He is editor-in-chief of Data Mining and Knowledge Discovery, co-editor of the Springer Encyclopedia of Machine Learning, a member of the advisory board of Statistical Analysis and Data Mining, a member of the editorial board of Machine Learning and was a foundation member of the editorial board of ACM Transactions on Knowledge Discovery from Data. He was co-PC Chair of the 2010 IEEE International Conference on Data Mining and co-General Chair of the 2012 IEEE International Conference on Data Mining. He has received the 2013 IEEE ICDM Service Award and a 2014 Australian Research Council Discovery Outstanding Researcher Award.

Invited Speech 3 by Prof Gabriella Pasi

Prof Gabriella Pasi
University of Milano Bicocca,
Title: Generating, Communicating, Accessing and Analyzing Data in a Context Aware Perspective

The Internet of Things and the Web of Things have made context awareness a central issue in defining complex autonomic systems that rely on various layers, including devices, communications and applications. The spread usage of mobile devices that allow users to generate and access distributed data makes it extremely important to organize the forwarding and gathering of such data in a user centered way, which may support a targeted process of data analytics. Data science should strongly take into account the individual and her/his needs to look at data from a personal perspective. In this contribution an integrated view of context awareness is presented, with the several technological and scientific issues that it raises.

Bio for Prof Gabriella Pasi:

Gabriella Pasi is a professor at University of Milano Bicocca, Department of Informatics, Systems and Communication. Since 1989 the research activity of Gabriella Pasi is aimed at proposing new models and systems for a flexible and personalized management and access to information. By flexibility is meant the ability to deal with both precise and imperfect information, i.e. information affected by imprecision/vagueness and/or uncertainty. To this aim she applied soft computing techniques, based on fuzzy set theory, possibility theory, multi-criteria decision making and associative neural networks as means to design flexible information systems.

Invited Speech 4 by Prof Osmar Zaiane

Prof Osmar Zaiane
University of Alberta,
Title: Why the worlds of MOOCs and Big Data do not collide?

MOOCs are a new popular trend of providing online courses at a very large scale. This new popular world-wide phenomenon is yielding a very large amount of data and opportunities for analysis. Big Data on the other hand is the integration of large volumes of data coming from disparate sources at a rapid pace and can produce added data value and insight. It seems that a marriage between the two is inevitable and natural and data science can be profitable to improve the experience on MOOCs. Yet, even the techniques proposed in education data mining are not adopted in current MOOCs to enhance the involvement of learners and the evaluation of the learning. What can be done for the union of data science with MOOCs?