Data science overview

The twenty-first century has ushered in a new age of data science and analytics. Data-driven scientific discovery is regarded as the fourth science paradigm. Data science is a core driver of the next-generation science, technologies and applications, and is driving new researches, innovation, profession, economy and education across disciplines and across domains. There are many associated scientific challenges, ranging from data capture, creation, storage, search, sharing, modeling, analysis, and visualization. Among the complex aspects to be addressed we mention here the integration across heterogeneous, interdependent complex data resources for real-time decision making, streaming data, collaboration, and ultimately value co-creation. Data science encompasses the areas of data analytics, machine learning, statistics, optimization and managing big data, and has become essential to glean understanding from large data sets and convert data into actionable intelligence, be it data available to enterprises, society, Government or on the Web.

Data sciences and big data analytics involve, but are not limited to, the following major aspects and problems: (1) data intelligence, (2) data uncertainty and fuzzy systems, (3) neural networks and deep learning, (4) system infrastructure and architecture, (5) networking and interoperation, (6) data modeling, analytics, mining and learning, (7) simulation and evolutionary computation, (8) privacy and security, (9) enterprises, services, applications, solutions and systems, and (10) trust, value, impact and utility.

The exploration of the above issues in data science and analytics science requires the synergy between many related research areas, including data preparation and preprocessing, distributed systems and information processing, distributed agent systems, parallel computing, cloud computing, data management, fuzzy systems, neural networks, evolutionary computation, system architecture, enterprise infrastructure, network and communication, interoperation, data modeling, data analytics, data mining, machine learning, cloud computing, service computing, simulation, evaluation, business process management, industry transformation, project management, enterprise information systems, privacy processing, information security, trust and reputation, business intelligence, business value, business impact modeling, and utility of data and services.



Refer to the following references for more information about data science and analytics:

Longbing Cao. Data Science and Analytics: A New Era. International Journal of Data Science and Analytics, Volume 1, Issue 1, 1-2, 2016.

Longbing Cao. Data Science: A Comprehensive Overview. Submitted to ACM Computing Surveys for review.

Longbing Cao. Data Science: Profession and Education. To be released when it is finalized.

Longbing Cao. Data Science: Nature and Pitfalls. IEEE Intelligent Systems, Volume: 31, Issue: 5, 66-75, 2016.

Longbing Cao and Usama Fayyad. Data Science: Challenges and Directions. Communications of the ACM, 2016.

J.W. Tukey, “The Future of Data Analysis,” Annals of Mathematical Statistics, vol. 33, no. 1, 1962, pp. 1–67

  1. Schwab, The Global Competitiveness Report 2011–2012, report, World Economic Forum, 2011.

J.M. Chambers, “Greater or Lesser Statistics: A Choice for Future Research,” Statistics and Computing, vol. 3, no. 4, 1993, pp. 182–184. 15. C. Rudin, Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society, Am. Statistical Assoc., 2014.

H.V. Jagadish, “Big Data and Science: Myths and Reality,” Big Data Research, vol. 2, no. 2, 2015, pp. 49–52. 25. D. Donoho, “50 Years of Data Science,” 2015; http://courses. docs/50YearsDataScience.pdf.

P.J. Diggle, “Statistics: A Data Science for the 21st Century,” J. Royal Statistical Society: Series A, vol. 178, no. 4, 2015, pp. 793–813.