Data Science Journey

The most-likely first appearance of “data science” as a term in literature came from the preface to the book [Naur 1974] in 1974, which defined data science as “the science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.” Before this, another term “datalogy” was first time introduced in 1968 as “the science of data and of data processes” [Naur 1968].

The journey of evolution from data analysis [Huber 2011] to data science started in the statistics and mathematics community in 1962, in which “data analysis is intrinsically an empirical science” [Tukey 1962] (according to this, David Donoho argued that data science has existed for 50 years and questioned how/whether data science is really different from statistics [Donoho 2015]). Typical original work toward promoting data processing includes information processing [(Ed.) 1968] and exploratory data analysis [Tukey 1977], which suggested to make more emphasis on using data for generating hypotheses to test. This further contributes to the later called “data-driven discovery” [KDD89 1989] in 1989. In 2001, an action plan was suggested in [Cleveland 2001] to expand the technical areas of statistics toward data science.

As a major role of statistics, descriptive analytics (also called descriptive statistics in the statistics community) [Stewart and McMillan 1987] quantitatively summarizes or describes the characteristics and measurements of a data sample or set. Today, it has formed the default analytical tasks and tools in typical data analysis projects and systems.

Our understanding of the roles of data analysis further went beyond data exploration and processing, to “convert data into information and knowledge” [IASC 1977]. More than 20 years later, this further fostered the origin of the currently popular community of ACM SIGKDD conference, i.e., the first workshop on Knowledge Discovery in Databases (KDD for short) with IJCAI’1989 [KDD89 1989], in which “data-driven discovery” was taken as one of three themes of the workshop. Since then, key terms such as “data mining”, “knowledge discovery” [Fayyad et al. 1996] and data analytics have been increasingly recognized not only in IT but also in other areas and disciplines. Data mining and knowledge discovery is the process of discovering hidden and interesting knowledge from data. Today, in addition to KDD, ICML and NIPS, many regional and international conferences and workshops on data analysis and learning have been created, which have become probably the fastest growing and most popular computer science community.

Fig. 1. Data science journey (w.r.t. typical events) [Cao 2016]

**References**:

[Cao 2016] Longbing Cao. Data Science: A Comprehensive Overview. Submitted to ACM Computing Surveys for review.

Peter Naur. 1968. ‘Datalogy’, the science of data and data processes. (1968), 1383–1387

Peter Naur. 1974. Concise Survey of Computer Methods. Studentlitteratur, Lund, Sweden.

Peter J. Huber. 2011. Data Analysis: What Can Be Learned From the Past 50 Years. John Wiley & Sons

John W. Tukey. 1962. The future of data analysis. Ann. Math. Statist. 33, 1 (1962), 1–67.

John W. Tukey. 1977. Exploratory Data Analysis. Pearson

David Donoho. 2015. 50 Years of Data Science. (2015). http://courses.csail.mit.edu/18.337/2015/docs/ 50YearsDataScience.pdf

SIGKDD. 1989. IJCAI-89 Workshop on Knowledge Discovery in Databases. (1989). http://www.kdnuggets. com/meetings/kdd89/index.html

William S. Cleveland. 2001. Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review 69, 1 (2001), 21–26. DOI:http://dx.doi.org/10.1111/j.1751-5823.2001.tb00477.x

Thomas R. Stewart and Jr. Claude McMillan. 1987. Descriptive and prescriptive models for judgment and decision making: Implications for knowledge engineering. In Expert Judgment and Expert Systems, Jeryl L. Mumpower, Ortwin Renn, Lawrence D. Phillips, and V. R. R. Uppuluri (Eds.) (Eds.). SpringerVerlag, London, 305–320

IASC. 1977. International Association for Statistical Computing. (1977). http://www.iasc-isi.org/

Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From data mining to knowledge discovery in databases. AI Magazine 17, 3 (1996), 37–54