What The Fuck Is Data Science?

If you are a contemporary technologists, then no doubt you have heard the term ‘Data Science’ (DS). Now, DS’ close relative ‘Machine Learning’ is never far from the mix in a conversation that utters those magical words. ‘Big Data’, ‘Data Analytics’, ‘Predictive Analytics’ are synonyms while its older version ‘Data Mining’ has gone the way of Microsoft 95.

Insert cheesy informercial voice: ‘With the power of machine learning, you can take three cameras and synthesize them into a single higher resolution shot’ – while machine learning can also take shots from 5 cameras on something that is not an iPhone too e.g. a Nokia.

In other words, Data Science & Machine Learning have a little bit of self-interested marketing blitz attached to them so that they are associated with all things novel, technologically innovative etc. This makes the definition ever more difficult to pin down.

Sycophantically, Data Science is often attached to industry goals. It tells management what consumers are looking for, thought existed in a product catalog or what products they may use under certain conditions. It’s the application of statistical methods to large quantities of machine generated or curated data.

Data Science Tasks

A Data Scientist can take data from some domain and make predictions about potential behavior that corresponds to that domain using some programming techniques covered in a UC Davis Neuroscience course. Still not clear? I personally don’t think we’ll get a more sophisticated definition than ‘one part large data, one part domain expertise & two scoops of computer science with a side of statistics’.

As universities around the world rush to not only complement subject matter expertise with a stochastic method component (think UC Davis deploying Computational Linguistics every year since the late 2000’s), there are also efforts to expand the data science curriculum into more generalist, undergraduate courses. Currently, there is enough material to distill into several Data Science courses for an undergraduate degree at UNAM (National Autonomous University of Mexico) much of it sourcing from faculty research.

All of this is to say that Data Science will continue to evolve and generate interest because of its uniquely tangible results: predictions that are better than nothing in domains that IT industry cares about.

For many of the core technical points, we find that the best resource is the Stanford University course on Data Mining.

— Ricardo Lezama