Data science tools

— Cloud infrastructure:
Such as Apache Hadoop, Spark, Cloudera, Amazon Web Services, Unix shell/awk/gawk, 1010data, Hortonworks, Pivotal, and MapR. Most traditional IT vendors have migrated their services and platforms to support cloud.

— Data/application integration:
Including Ab Initio, Informatica, IBM InfoSphere DataStage, Oracle Data Integrator, SAP Data Integrator, Apatar, CloverETL, Information Builders, Jitterbit, Adeptia Integration Suite, DMExpress Syncsort, Pentaho Data Integration, and Talend [Review 2016].

— Master data management:
Typical software and platforms include IBM InfoSphere Master Data Management Server, Informatica MDM, Microsoft Master Data Services, Oracle Master Data Management Suite, SAPNetWeaver Master Data Management tool, Teradata Warehousing, TIBCO MDM, Talend MDM, Black Watch Data.

— Data preparation and processing:
In Today [Today 2016], 29 data preparation tools and platforms were listed, such as Platfora, Paxata, Teradata Loom, IBM SPSS, Informatica Rev, Omniscope, Alpine Chorus, Knime, and Wrangler Enterprise and Wrangler.

— Analytics:
In addition to well-recognized commercial tools including SAS Enterprise Miner, IBM SPSS Modeler and SPSS Statistics, MatLab and Rapidminer [RapidMiner 2016], many new tools have been created, such as DataRobot [DataRobot 2016], BigML [BigML 2016], MLBase [Lab 2016], and APIs including Google Cloud Prediction API [Google 2016b].

— Visualization:
Many free and commercial software are listed in KDnuggets [KDnuggets 2015] for visualization, such as Interactive Data Language, IRIS Explorer, Miner3D, NETMAP, Panopticon, ScienceGL, Quadrigram, and VisuMap.

— Programming:
In addition to the main languages R, SAS, SQL, Python and Java, many others are used for analytics, including Scala, JavaScript, .net, NodeJS, Obj-C, PHP, Ruby, and Go [Davis 2016].

— High performance processing:
In Wikipedia [Wikipedia 2016a], about 40 computer cluster software are listed and compared in terms of their technical performance, such as Stacki, Kubernetes, Moab Cluster Suite, and Platform Cluster Manager. Submitted to ACM Computing Surveys for Review, Vol. 1, No. 1, Article 1, Publication date: January 2016. Data Science: A Comprehensive Overview 1:33

— Business intelligence reporting:
There are many reporting tools available [Capterra 2016b; Wikipedia 2016c], typical of which are Excel, IBM Cognos, MicroStrategy, SAS Business Intelligence, and SAP Crystal Reports.

— Project management:
In Capterra [Capterra 2016a], more than 500 software and tools were listed for project management, including Microsoft Project, Atlassian, Podio, Wrike, Basecamp, and Teamwork.

— Social network analysis:
In Desale [Desale 2015], 30 tools were listed for SNA and visualization, such as Centrifuge, Commetrix, Cuttlefish, Cytoscape, EgoNet, InFlow, JUNG, Keynetiq, NetMiner, Network Workbench, NodeXL, and SocNetV (Social Networks Visualizer).

— Other tools:
Increasing numbers of tools have been developed and are under development for domain-specific and problem-specific data science, such as Alteryx and Tableau for tablets; SuggestGrid and Mortar Recommendation Engine for recommender systems [Github 2016b]; OptumHealth, Verisk Analytics, MedeAnalytics, McKesson and Truven Health Analytics [Technavio 2016] for healthcare analytics; BLAST, EMBOSS, Staden, THREADER, PHD and RasMol for bioinformatics.