The Best Data Science Courses on EdX

Learn smarter with these EdX courses – Photo by Tim Mossholder on Unsplash

Learn data science with these amazing courses from EdX, from some of the world’s best educational institutions.

There has never been an easier time to access information, and consequently to learn literally almost anything. This is generally more true the more modern a field is, the more a community is online-based, and also for anything programming-related. Data science is at the intersection of these Venn diagrams, being a very modern field that is highly related to computer science, and it benefits from having a very large (and growing) online community.

From videos, guides and tutorials to open-source documentation and books (see our guide to the best data science books HERE), there is a plethora of fruitful resources out there for aspiring data scientists. 

MOOCs (Massive Open Online Courses) are no exception. EdX is one of the leaders in this space, having been founded by MIT & Harvard University in 2012 and designed to host online university-level courses. It is said to currently have over 18 million students!

We generally think very highly of EdX here at Data Courses, and that goes the same for its data science courses. In this article, we have selected some of what we think are the best best data science courses from EdX for you to take a look at. We understand that everybody’s backgrounds and needs are different, that is probably more true in data science than in any other fields. So, we’ve separated our top picks of courses into topics. 

Introductory

Introduction to Data Science (IBM)

(https://www.edx.org/course/intro-to-data-science)

If you are new to the field altogether, take a look at this course. Data science is a field that integrates learnings from multiple fields, such as statistics, probability, computer science, linguistics and machine learning. So if you are new to these as well as the application side of data science, it can feel overwhelming.

This IBM course is a very easy, ‘soft’ introduction to data science. It is designed to initiate students to the field by introducing data science concepts and its origins, as well as ‘real life’ applications. The course does not involve any maths or sciences, and so it makes for a great high-level introduction.

The course covers all the topics below:

  • Definition of data science and what data scientists
  • Tools and algorithms used on a daily basis within the field
  • Skills needed to be a successful data scientist
  • The role of data science within a business
  • How to form a strong data science team

It’s currently offered for $39.

Python Basics for Data Science (IBM)

(https://www.edx.org/course/python-basics-for-data-science)

Knowledge of Python is a hard prerequisite for data scientists in many fields. For me personally, I prefer Python for its general applicability. Check this course out if you are not familiar with Python for its very basics. The course is currently offered for $39 and as such is a steal.

You’ll cover basic concepts in Python that will inch you closer to what you need to know as a Data Scientist or Python powered Data Analyst such as:

  • The application of Python to Data Science
  • How to define variables in Python
  • Sets and conditional statements in Python
  • The purpose of having functions in Python
  • How to operate on files to read and write data in Python
  • How to use pandas

Introduction to Computational Thinking and Data Science (MITx)

(https://www.edx.org/course/introduction-to-computational-thinking-and-data-4)

If you are after a slightly more challenging, integrative introductory course, take a look at this MIT course. This will teach you to think like a programmer, and more importantly – like a problem solver, using programming. The lecturers also cover topics in statistics/probability, and regression. Think of it as an introductory course, but at an intermediate level.

Statistics & Probability

Statistics & probability is not just one part of data science, but a major part of it. It is difficult, almost impossible, to be a data science practitioner at a high level without a good grasp of statistics. Depending on your background, check out one or more of these courses.

Data Science: Probability (HarvardX)

(https://www.edx.org/course/data-science-probability)

For those who are somewhat new to probability and statistics, I would start here. It is an introductory course, and accordingly it presents relatively basic probability concepts that will serve as foundations for moving on. It does involve some maths, but it really can’t be avoided in learning probability, and the lectures do a reasonable job in explaining the intuition.

While most of the other courses in this list use Python as a base for learning data science, this course focuses on using R for skill development, however all the material is transferable to any statistical programming language.

The core takeaways from the course are:

  • Concepts in probability theory
  • Introduction to random variables and independence
  • How to perform a Monte Carlo simulation
  • The meaning of expected values and standard errors and how to compute them in R
  • The importance of the Central Limit Theorem

The course is currently offered for $49, however the course can be audited for free – which will not include a certificate.

Introduction to Probability (HarvardX)

(https://www.edx.org/course/introduction-to-probability)

Ironically, the course titled ‘Introduction to Probability’, offered by the same institution, is more of an intermediate course than the above ‘Data Science: Probability’ course. 

I think of this course as an expanded version of the above course, for those interested in probability theory. As more or less a pure probability course, I would recommend it as a reference to go back on when you wish to brush up on certain topics. The next is perhaps more interesting for data scientists.

The core concepts and takeaways from the class are below:

  • How to think about uncertainty and randomness
  • How to make good predictions
  • The story approach to understanding random variables
  • Common probability distributions used in statistics and data science
  • Methods for finding the expected value of a random quantity
  • How to use conditional probability to approach complicated problems

The course is currently offered for $99.

Probability and Statistics in Data Science using Python (UCSanDiegoX)

(https://www.edx.org/course/probability-and-statistics-in-data-science-using-p)

This course is probably the most challenging of the three courses that we list on this topic, but potentially also the most rewarding, or interesting. As a part of UC San Diego’s course on Data Science, this course also integrates programming exercises, which I appreciate. It also covers more statistical and modelling concepts such as PCA and regression, which are commonly used in data science and should be understood well. 

Concepts covered included in the course are: random variables, dependence, correlation, regression, PCA (Principle Component Analysis), entropy and MDL

This is one of the more expensive courses on our list at $350, however it is a part of the MicroMasters program offered by UC San Diego.

Data Preparation

When data science is mentioned, the first image that is conjured is that of programmers developing incredible machine learning algorithms that can play Go at expert level or write its own Harry Potter fan fiction. 

Before any of that can be done, the input data must be cleaned, filtered, filled, interpolated, and otherwise generally wrangled. These two courses are about all of that unglamourous, yet essential work. They’re relatively short – definitely worth checking out as a whole, or in parts – as supplementary material.

Data Science Research Methods: Python Edition (Microsoft)

(https://www.edx.org/course/data-science-research-methods-python-edition-2)

This is a conceptual course for designing and carrying out data science research and how to apply it to specific problems. While being high-level and abstract, it does include useful information including:

  • Data analysis and inference
  • Data science research design
  • Experimental data analysis and modeling

All of these are soft-skills and methodologies that every data scientist should know and understand.

The course is offered at $99.

Data Science: Wrangling (HarvardX)

(https://www.edx.org/course/data-science-wrangling)

As the name suggests, this course deals primarily with data wrangling for the crucial data that will form inputs to your data science projects. It’s often said that 90% of a data scientist or analysts time is spent as a “data-janitor”.

The core takeaways from the course are below:

  • Importing data into R from different file formats
  • Web scraping
  • How to tidy data using the tidyverse to better facilitate analysis
  • String processing with regular expressions (regex)
  • Wrangling data using dplyr
  • How to work with dates and times as file formats
  • Text mining

The course is offered at $49.

Machine Learning

In many ways, machine learning is the heart and soul of data science. Whilst the other topics are no doubt indispensable, the advent and proliferation of machine learning has been the driving factor in the growth of data science. Here are courses that will help you cut your teeth in this area.

Principles of Machine Learning: Python Edition (Microsoft)

(https://www.edx.org/course/principles-of-machine-learning-python-edition-2)

It’s hard to cover machine learning concepts, remain accessible (i.e. without too much maths) and also provide programming exercises for the students. This Microsoft course does a pretty reasonable at meeting that intersection. While there are more comprehensive courses out there, this course is a good stepping stone on the way.

Machine Learning Fundamentals (UCSanDiegoX)

(https://www.edx.org/course/machine-learning-fundamentals-5)

This is a comprehensive course, and slightly more accessible than the next course in my view. The focus of this course appears to be on intuitive understandings of different approaches and techniques, rather than the maths. It is still quite comprehensive, and you gain a good depth and breadth of knowledge from it.

Machine Learning with Python: from Linear Models to Deep Learning (MITx)

(https://www.edx.org/course/machine-learning-with-python-from-linear-models-to)

This is the real deal. If you’re serious about machine learning, and want to learn all about machine learning, soup to nuts, this is as good a course as any that you’ll find on EdX. The fact that EdX suggests it to be a 10-14 hours/wk commitment for 14 weeks probably says enough. While the course content is challenging, they do their best to ensure that the student is aware of challenges and requirements, with well thought-out homework and programming exercises. 

Summary

I hope the list of EdX’s best data science courses has been useful. There are so many great courses on EdX, so it’s hard to choose just a few. But I have no doubt that there is something here for everybody. Let us know if you disagree with any of the above or have any suggestions!