Data A-Z dictionary



The big data community often uses multiple “V”s to describe the characteristics, challenges, and opportunities of big data. These include volume (size), velocity (speed), variety (diversity), veracity (quality and trust), value (insight), visualization, and variability (formality). In fact, these terms cannot completely describe big data or the field of data science. Therefore, it is valuable to build a Data A–Z dictionary to capture the intrinsic comprehensive but diverse aspects, characteristics, challenges, domains, tasks, processes, purposes, applications, and outcomes of data. Figure 1 lists a sample sequence of data science keywords. It is notable that such a Data A–Z ontology probably covers most of the topics of interest to major data science communities. The exercise of constructing Data A–Z can substantially deepen and broaden the understanding of intrinsic data characteristics, complexities, challenges, prospects, and opportunities.

data science keywords from the Data A–Z dictionary

Figure 1. Sample sequence of data science keywords from the Data A–Z dictionary.


Note: Excerpted from “L. Cao. Data Science: Nature and Pitfalls, IEEE Intelligent Systems, Volume: 31, Issue: 5, 66-75, 2016”



Some key concepts in data science

There are several key terms, such as data analysis, data analytics, advanced analytics, big data, data science, deep analytics, descriptive analytics, predictive analytics, and prescriptive analytics, which are highly connected and easily confusing. Table I lists and explains them, which are also the key terms widely used in this review.


Table I. Some key terms in data science. [1]


Key terms Description
Advanced analytics Refers to theories, technologies, tools and processes that enable an in-depth understanding and discovery of actionable insights in big data, which cannot be achieved by traditional data analysis and processing theories, technologies, tools and processes.
Big data Refers to data that are too large and/or complex to be effectively and/or efficiently handled by traditional data-related theories, technologies and tools.
Data analysis Refers to the processing of data by traditional (e.g., classic statistical, mathematical or logical) theories, technologies and tools for obtaining useful information and for practical purposes.
Data analytics Refers to the theories, technologies, tools and processes that enable an in-depth understanding and discovery of actionable insight into data. Data analytics consists of descriptive analytics, predictive analytics,and prescriptive analytics.
Data science Is the science of data.
Data scientist Refers to those people whose roles very much center on data.
Descriptive analytics Refers to the type of data analytics that typically uses statistics to describe the data used to gain information, or for other useful purposes.
Predictive analytics Refers to the type of data analytics that makes predictions about unknown future events and discloses the reasons behind them, typically by advanced analytics.
Prescriptive analytics Refers to the type of data analytics that optimizes indications and recommends actions for smart decision-making.
Explicit analytics Focuses on descriptive analytics typically by reporting, descriptive

analysis, alerting and forecasting.

Implicit analytics Focuses on deep analytics, typically by predictive modeling, optimization, prescriptive analytics, and actionable knowledge delivery.
Deep analytics Refers to data analytics that can acquire an in-depth understanding of why and how things have happened, are happening or will happen, which cannot be addressed by descriptive analytics.