Author
HCC team
Date
Apr 1, 2021
Category
Engagement
SHARE

Three common myths about data science

The idea of data science holds a lot of myth or misconceptions. We decided to touch on three common misconceptions around data science and what it means to organisations.

The term data science has its origins in the late 1990s but only grew in popularity over the last five years. From 2015, only then did we started searching for it seriously in Google.

Today, the term "data science" permeates even the smallest organisations in one form or another. Yet, simultaneously, the search for "statistics" has trended downwards - perhaps reflective of the overlap or even replacement that data science offers, in part, to that once classic and elite field.

Still, the idea of data science holds a lot of myth or misconceptions - coddled by the hype around it. So when we try to imagine the vast machinery forecasting the weather, our traffic conditions, election outcomes, or even the most spookily relevant Facebook ads, it's understandable that our imaginations would take to its seemingly infinitesimal possibilities.

We decided to touch on three common misconceptions around data science and what it means to organisations.

Myth: Data science is an autonomous process that can be let loose to find the problems.

Actual: Kelleher and Tierney (2018) explained rightly that human oversight is necessary throughout the majority of a data science project. All data science projects have to begin by investigating a relevant obstacle; only humans can understand and contextualise what is needed for the organisation. The preparation of the data, selecting the most appropriate machine learning algorithm, and interpretation of the insights have to be framed by a person for a human audience.

Myth: Every data science project needs big data.

Actual: This is probably one of the most limiting misconceptions for organisations considering a data science project. Generally, the more data you have, the more specific you can get, but more important is having the correct data. Kelleher and Tierney explain that non-obvious insights can be found for student churn in institutions will less than 10,000 students or unions with only several thousand members. We have looked at data in organisations with just over a thousand members and gained some valuable and actionable insights. Our brains can usually understand and imagine, at most, how two or three variables influence each other simultaneously. A small data science project can find relationships between tens or hundreds of different aspects or variables.

Myth: Data science will tell me what I already know.

Actual: Good data science aims to uncover the non-obvious patterns in data that are actually useful. Suppose it's something that a human expert can easily find or model in their mind. In that case, it's not worth the effort of using data science to discover it. The point is that using data science aims to improve organisational decision-making by identifying non-obvious patterns. A good data scientist can then interpret these patterns into something that will give your organisation a competitive edge. Linoff and Berry 2011 summarise this as "Data mining lets computers do what they do best - dig through lots of data. This, in turn, lets people do what people do best, which is to set up the problem and understand the results."


Honold, A. (2021). How much data is enough?. https://towardsdatascience.com/how-much-data-is-enough-366d5b11ca3c

Kelleher, J., & Tierney, B. (2018). Data science. Cambridge, Massachusetts: The MIT Press.

Linoff, G., & Berry, M. (2011). Data mining techniques: For marketing, sales, and customer relationship management. Indianapolis, IN: Wiley.

Read More