An increasing number of academic and research institutions are working on defining
the certification and accreditation of next-generation data scientists. This is reflected
in general and domain-specific data science curricula for Masters and PhD qualifications.
- Learn the business problem domain, talk to business experts and decision-makers to understand the business objectives, requirements and preferences, issues and constraints facing an organization; understand the organizational maturity; identify, specify and define the problems, boundaries and environment, as well as the challenges; generate business understanding reports;
- Identify and specify social and ethical issues such as privacy and security; develop ethical reasoning plans to address social and ethical issues;
- Understand data characteristics and complexities; identify the problems and constraints of the data; develop a data understanding report; specify and scope analytical goals and milestones by developing respective project plans to set up an agenda and create governance and management plans;
- Set up engineering and analytical processes corresponding to analytical goals for turning business and data into information, turning information into insight, and turning insight into business decision-making actions by developing technical plans for the discovery, upgrade and deployment of relevant data intelligence;
- Transform business problems into analytical tasks, and conduct advanced analytics by developing corresponding techniques, models, methods, algorithms, tools and systems, experimental design and evaluation of data science, generating better practices experience, performing descriptive, predictive and prescriptive analytics, conducting survey research, and supporting visualization and presentation;
- Based on the understanding of data characteristics and complexities, extract, analyze, construct, mine and select discriminative features, constantly optimize and innovate new variables for best possible problem representation and modeling; when necessary, conduct data quality enhancement [Hazena et al. 2014];
- Combine analytical, statistical, algorithmic, engineering and technical skills to mine relevant data by involving contextual information; invent novel and effective models, and constantly improve modeling techniques to optimize and boost model performance and seek to achieve best practice;
- Maintain, manage and refine projects and milestones, and their processes, deliverables, evaluation, risk and reporting to build active, lifecycle management;
- Develop corresponding services, solutions and products or modules to feed into a system package on top of user-specified programming languages, frameworks and infrastructure, or open source tools and frameworks;
- Maintain the privacy, security and veracity of data and deliverables;
- Engage in frequent client interaction during the whole lifecycle; tell clear and concise stories and draw simple conclusions from complex data or algorithms; provide clients with situational analyses and deep insights into areas requiring improvement; translate into business improving actions in the final deployment;
- Write coherent reports and make presentations to specialists and non-specialists; present executive summaries with precise and evidence-based recommendations and risk management strategies, especially for decision-makers and business owners.
Note: Excerpted from “Longbing Cao. Data Science: A Comprehensive Overview”
1. Benjamin T. Hazena, Christopher A. Booneb, Jeremy D. Ezellc, and L. Allison Jones-Farmer. 2014. Data
quality for data science, predictive analytics, and big data in supply chain management: An introduction
to the problem and suggestions for research and applications. International Journal of Production
Economics 154 (2014), 72–80.