Skip to main content

Data Science and Big Data Analytics: Making Data-Driven Decisions

Solve Complex Issues With Your Data

Start Date: Oct 23, 2017
Duration: 8 Weeks
Price: $849

Course Description

Every day, your organization generates new data on your customers, your processes, and your industry. But could you be using this data more effectively? Discover how to turn big data into even bigger results in this eight-week online course and earn an MIT Certificate on Data Science as well as 1.8 Continuing Education Units (CEUs) upon completion.

This course was developed by over ten MIT faculty members at the Institute for Data, Systems and Society (IDSS). It is specially designed for data scientists, business analysts, engineers, and technical managers looking to learn the latest theories and strategies to harness data.

Not sure this course is for you? Download this sample case study on building a movie recommendation system taken from the course so you can get a sneak peek of what’s included in the course.

Turn Your Knowledge into Action

Through digital lectures and hands-on case studies based on examples from real-world business scenarios, you’ll acquire the theory, strategies, and tools you need to:

  • Apply data science techniques to your organization’s data management challenges.
  • Identify and avoid common pitfalls in big data analytics.
  • Deploy machine learning algorithms to mine your data.
  • Interpret analytical models to make better business decisions.
  • Understand the challenges associated with scaling big data algorithms.
  • Convert datasets to models through predictive analytics.

Real-World Case Studies & Hands-on Projects

Ever wondered how top companies perfect their recommendation systems? Or how auto manufacturers develop their GPS technology? In Data Science and Big Data Analytics: Making Data-Driven Decisions, you’ll be able to examine over 20 case studies and apply your knowledge by:

  • Tracking the 2D and 3D position of objects with a Kalman filter.
  • Building your own movie, music, and product recommendation systems, just like Netflix or Pandora.
  • Automatically clustering news stories with a spectral technique algorithm.
  • Predicting wages with a linear regression model.
  • Exploring one or two layer perceptrons to assess their decision boundaries.
  • Using network-theoretic ideas to identify new candidate genes that might cause autism.

What You'll Learn

  • Apply data science techniques to your organization’s data management challenges.
  • Identify and avoid common pitfalls in big data analytics.
  • Deploy machine learning algorithms to mine your data.
  • Interpret analytical models to make better business decisions.
  • Convert datasets to models through predictive analytics.

Want to purchase this course for a group?

You can purchase enrollment codes for this course to distribute to your team

Purchase for a Group

Instructors

Devavrat Shah, Co-Director

Devavrat Shah, Co-Director Director, Statistics and Data Science Center (IDSS), Professor, Laboratory for Information and Decision Systems (LIDS), Computer Science and Artificial Intelligence Laboratory (CSAIL) and Operations Research Center (ORC)

Philippe Rigollet, Co-Director

Philippe Rigollet, Co-Director Associate Professor, Mathematics department and Statistics and Data Science Center (IDSS)

Guy Bresler

Guy Bresler Assistant Professor, Electrical Engineering and Computer Science, LIDS and IDSS

Tamara Broderick

Tamara Broderick Assistant Professor, Institute for Data, Systems, and Society (IDSS), Electrical Engineering and Computer Science (EECS) Department

Victor Chernozhukov

Victor Chernozhukov Professor, Department of Economics; Statistics and Data Science Center (IDSS)

David Gamarnik

David Gamarnik Professor, Sloan School of Management

Stefanie Jegelka

Stefanie Jegelka Assistant Professor, Institute for Data, Systems, and Society (IDSS), Electrical Engineering and Computer Science (EECS) Department

Jonathan Kelner

Jonathan Kelner Associate Professor, Department of Mathematics and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)

Ankur Moitra

Ankur Moitra Assistant Professor, Department of Mathematics and member of the Computer Science and Artificial Intelligence Lab (CSAIL)

Caroline Uhler

Caroline Uhler Assistant Professor, Institute for Data, Systems, and Society (IDSS), Electrical Engineering and Computer Science (EECS) Department

Kalyan Veeramachaneni

Kalyan Veeramachaneni Principal Research Scientist, MIT Laboratory for Information and Decision Systems (LIDS)

COURSE OVERVIEW

Over the course of eight weeks, you will take your data analytics skills to the next level as you learn the theory and practice behind recommendation engines, regressions, network and graphical modeling, anomaly detection, hypothesis testing, machine learning, predictive and big data analytics. You will acquire the theories, strategies, and tools you need to answer questions such as:

  • What is clustering and when should I use it?
  • What is the best way to design experiments and conduct hypothesis testing using my data?
  • How should I do model selection and avoid over-fitting?
  • What are the latest trends in machine learning?
  • How do graphical models and network models differ?

Time Commitment
MIT xPRO courses are designed to fit the schedules of busy professionals. The course requires a time commitment of three-to-four hours a week comprised of videos, assigned reading, and assignments.

Each video module is pre-recorded, enabling you to watch it anytime. While you may complete most of the program as quickly as you wish, most participants find it beneficial to adhere to the weekly schedule and participate in online discussion forums along the way. Deadlines include:

  • November 30, 2017: Submit the Recommendation System Case Study (Module 4)
  • December 3, 2017: Peer-Review the Recommendation System Case Study (Module 4)
  • December 14, 2017: Submit Case Study (Final Module)
  • December 17, 2017: Peer-Review Case Study (Final Module)
  • December 17, 2017: Submit all end-of-topic graded assessments

Browser/Technical Requirements
Access to our courses requires an Internet connection, as videos are only available via online streaming, and cannot be downloaded for offline viewing. Please take note of your company's restrictions for viewing content and/or firewall settings.

EARN A CERTIFICATE OF COMPLETION AND CEUS

Participants who successfully complete the course and all assessments will receive a Certificate in Data Science from MIT xPRO. This course does not carry MIT credits or grades, however, an 80% pass rate is required in order to receive the certificate. Course requirements include:

  • Submission and peer-review of two case studies
  • Passing grades on eight assessments

WHO SHOULD PARTICIPATE

This course is designed for data scientists and data analysts, as well as professionals who wish to turn large volumes of data into actionable insights. Because of the broad nature of the information, the course is well suited for both early career professionals and senior managers. Since this is not an introductory course, the faculty strongly recommends participants to have substantial background knowledge of statistical techniques and data calculations or quantitative methods of data research.

Participants may include:

  • Technical managers
  • Business intelligence analysts
  • Management consultants
  • IT practitioners
  • Business managers
  • Data science managers
  • Data science enthusiasts

COURSE REVIEWS

“Leveraging this knowledge will allow me to position myself as a hybrid analyst-data scientist, which greatly increases my value to the company.” - Ryan Michael Dickinson

“I really enjoyed the interactions/animations in the videos. These really helped with visualizing the concepts… I feel more equipped to understand what type of insights can be gleaned from a particular set of data, and can better communicate these asks to our data science team.” - Reza Dawood

“The course content was really amazing and gave me exact direction to head towards the Big Data topic.” - Prasad Sankpal

“It's very critical to keep acquiring new knowledge in today's ever changing landscape of both world order and opportunities available to professionals.” - Joanna Zarach

“The quality and pace of the videos and material is top-notch. I really like having different instructors for different modules and having two instructors interacting together makes the material more vivid and entertaining.” - Miguel Hurtado

“Armed with the knowledge I have gained from his course, I can introduce my team to certain methods that can be applied to our day to day work.” – Anonymous Learner

COURSE OUTLINE

Course materials blend the following pedagogical strategies to best achieve the learning objectives of the course and individual modules:

  • Instructivism: Teacher-centered learning where the instructors present relevant content (tutorial videos enhanced with animation and graphics). Students will test their knowledge through graded tests.
  • Constructivism: Learning by doing approach. We encourage learners to construct their own understanding through solving the mandatory and optional case studies and practicing.
  • Social constructivism: Learning through social interactions and communication. You will be able to discuss with your peers in the discussion groups, and evaluate and get reviews from your peers through two compulsory case studies.
  • Connectivism: Connecting with others and extending your knowledge through communication. You will be able to expand and share your knowledge with others through the Discussion group, and course groups on Facebook, and LinkedIn.
  • Module 1: Making sense of unstructured data

    • Clustering
    • Spectral Clustering, Components and Embeddings
    • Case Studies

    Module 2: Regression and Prediction

    • Classical Linear & nonlinear regression & extension
    • Modern Regression with High-Dimensional Data
    • The use of modern Regression for causal inference
    • Case Studies

    Module 3:  Classification, Hypothesis Testing and Anomaly Detection

    • Hypothesis Testing and Classification
    • Deep Learning
    • Case Studies

    Module 4: Recommendation Systems

    • Recommendations and ranking
    • Collaborative filtering
    • Personalized recommendations
    • Case Studies
    • Wrap-up: Parting remarks and challenges

    Module 5: Networks and Graphical Models

    • Introduction
    • Networks
    • Graphical Models
    • Case Studies

    Module 6: Predictive Modeling for Temporal Data

    • Introduction
    • Prediction engineering
    • Feature engineering
    • Modeling and evaluating predictive models
    • Deploying predictive models

CASE STUDY OUTLINES

In this course, you won’t just discover new strategies, tools, and insights- you’ll put them to the test. Every course module features a selection of case studies and hands-on projects that help you apply your newfound knowledge to realistic business challenges.

Time Commitment: For participants that wish to engage with the optional case study activities, please allow an extra 4+ hours a week. These Optional Case Study tutorials will require some prior knowledge and experience with the programming language you choose to use for reproducing case study results. Generally, participants with 6 months of experience using “R” or “Python” should be successful in going through these exercises. Please note that the case study activities are not required and do not count towards your "grade" or earning a certificate of completion.

Module 1: Making sense of unstructured data

Case Study 1: Genetic Codes

  • Case Study Activity Description: Use K-means to figure out that DNA is composed of three-letter words. We’ll help by demonstrating how to apply data visualization to genomic sequence analysis.
  • Data Sets & format: DNA text string
  • Tools used: Matlab

Case Study 2: LDA Analysis

  • Case Study Activity Description: Find themes in project descriptions using LDA. We’ll help by giving you tips on how to do your own analysis on MIT EECS faculty data using stochastic variational inference on LDA.
  • Data Sets & format: Scrape your own
  • Tools used: Python

Case Study 3: PCA: Identifying Faces

  • Case Study Activity Description: Implement your own image classification algorithm that helps classify photos of people’s faces. We’ll help by giving you tips on how to use PCA, along with examples and pseudo-code for the programming environment.
  • Data Sets & format: Instructors photos provided (14). Any other images will work, as long as they obey the restrictions noted in the Self Help document.
  • Tools used: Mathlab

Case Study 4: Spectral Clustering: Grouping News Stories

  • Case Study Activity Description: : Build your own clustering for online news stories—similar to how Google News organizes stories via auto-generated topics. We’ll help by giving you tips on Spectral Clustering, along with examples and pseudo-code for the programming environment.
  • Data Sets & format: Instructions for downloading news stories off the web.
  • Tools used: Python

Module 2: Regression and Prediction

Case Study 1: Predicting Wages 1

  • Case Study Activity Description: Predict wages and assess predictive performance using various characteristics of workers. We’ll help by describing the wage prediction model.
  • Data Sets & format: CPS 2012 Data, Rdata format
  • Tools used: R

Case Study 2: Gender Wage Gap

  • Case Study Activity Description: Estimate the difference in predicted wages between men and women with the same job characteristics. We’ll help by describing the estimation technique and presenting the results.
  • Data Sets & format: CPS 2012 Data, Rdata format
  • Tools used: R

Case Study 3: Do Poor Countries Grow Faster than Rich Countries?

  • Case Study Activity Description: Use a large dimensional dataset to answer the question: Do poor countries grow faster than rich countries? We’ll help by describing the estimation technique, giving you the tools, and presenting the results.
  • Data Sets & format: Barro-Lee Growth Data. Rdata format.
  • Tools used: R

Case Study 4: Predicting Wages 2

  • Case Study Activity Description: Predict wages using several machine learning methods and splitting data. We’ll help by describing the estimation technique and presenting the results.
  • Data Sets & format: 2015 CPS data, Rdata format.
  • Tools used: R

Case Study 5: The Effect of Gun Ownership on Homicide Rates

  • Case Study Activity Description: Use machine learning methods to estimate the effect of gun ownership on the homicide rate. We’ll help by describing the estimation technique and presenting the results.
  • Data Sets & format: U.S. Census Bureau Dataset. Csv format.
  • Tools used: R

MODULE 3.1: Classification and Hypothesis Testing

Case-study 1: Logistic Regression: The Challenger Disaster

  • Case Study Activity Description: Learn how to apply Logistic Regression in a practical real-world setting. We’ll help by giving you tips, examples, and pseudo-code for the programming environments.
  • Data Sets & format: Made available as a csv file along with the case study.
  • Tools used: User Choice: Python or R. Using the statsmodels library or the built-in glm function in R.

MODULE 3.2: Deep Learning

Case Study 2: Decision boundary of a deep neural network

  • Case Study Activity Description: Play with one or two layer perceptrons to assess their decision boundaries. We’ll help by explaining the multiple dimensions of perceptrons.
  • Data Sets & format: Synthetic 2D data points.
  • Tools used: Python (coding is not required for students)

MODULE 4: Recommendation Systems

Case Study 1: Recommending Movies

  • Case Study Activity Description: Build your own recommendation system for movies like the one used by Netflix. We’ll help by giving you tips, examples, and pseudo-code for the programming environments.
  • Data Sets & format: MovieLens dataset - public set
  • Tools used: User Choice: Python or R For Recommenders: RecommenderLab and Graphlab-Create

Case Study 2: Recommend New Songs to Users Based on Their Listening Habits

  • Case Study Activity Description: Build your own recommendation system for songs like the one used by Spotify. We’ll help by giving you tips, examples, and pseudo-code for the programming environments.
  • Data Sets & format: Million Song dataset
  • Tools used: User Choice: Python or R For Recommenders: RecommenderLab and Graphlab-Create

Case Study 3: Make New Product Recommendations

  • Case Study Activity Description: Build your own recommendation system for products on an e-commerce website like the one used by Amazon.com. We’ll help by giving you tips, examples, and pseudo-code for the programming environments.
  • Data Sets & format: Amazon Reviews data
  • Tools used: User Choice: Python or R For Recommenders: RecommenderLab and Graphlab-Create

MODULE 5: Networks and Graphical Models

Case study 1: Navigation / GPS
1.1: Kalman Filtering: Tracking the 2D Position of an Object when moving with Constant Velocity

  • Case Study Activity Description: Generate data, build the model for the motion dynamics, and perform the Kalman Filtering algorithm. We’ll help by giving you tips, examples, and pseudo-code for the programming environment.
  • Data Sets & format: Generating your own data. Model explanation and other parameter details provided in a separate write-up.
  • Tools used: Python. Using libraries like numpy, matplotlib

1.2: Kalman Filtering: Tracking the 3D Position of an Object falling due to gravity.

  • Case Study Activity Description: Generate data, build the model for the motion dynamics, perform the Kalman Filtering algorithm. We’ll help by giving you tips, examples, and pseudo-code for the programming environment.
  • Data Sets & format: Generating your own data. Model explanation and other parameter details provided in a separate write-up.
  • Tools used: Python. Using libraries like numpy, matplotlib

Case study 2: Identifying New Genes that cause Autism

  • Case Study Activity Description:Use network-theoretic ideas to identify new candidate genes that might cause autism. We’ll help by giving you tips, examples, and pseudo-code for the programming environment.
  • Data Sets & format: Made available as csv files.
  • Tools used: R

MODULE 6: Case studies

  • NY city taxi data: Predicting duration of a trip
  • Uk retail dataset: Retail data - various problems

FREQUENTLY ASKED QUESTIONS

Who can register for this course?
U.S. sanctions do not permit us to offer this course to learners in or ordinarily residing in Iran, Cuba, Sudan, and the Crimean region of Ukraine.

What do I need to do to register for the course?
Go to mixpro.mit.edu, click on the course you would like to register for, and click "Enroll Now" You may be prompted to first register for a MIT xPRO account if you do not have one already. Complete this process, then continue with enrollment process.

How do I register a group of participants?

    For a group of 5 or more individuals, you can pay via invoice. To be invoiced, please email mitxpro@mit.edu with the number of individuals in your group, and instructions to register will be provided. Please note that our payment terms are net zero, and all invoices must be paid prior to the course start date. Failure to remit payment before the course begins will result in removal from the course. No extensions or exceptions will be granted.

What is the registration deadline?
Individual registrations must be completed by October 27, 2017.

How should I pay?
Individual registrants must complete registrations and pay online with a valid credit card at the time of registration. MIT xPRO accepts globally recognized major credit or debit cards that have a Visa, MasterCard, Discover, American Express or Diner's Club logo. Invoices will not be generated for individuals, or for groups of less than 5 people. However, all participants will receive a payment receipt. Payment must be received in full; payment plans are not available.

When will I get access to the course site?
Instructions for accessing the course site will be sent to all paid registrants via email prior to the course launch date. In order to receive these instructions, please add mitxpro@mit.edu to your “trusted senders” list. If you have not received these instructions by the course start date, visit your account dashboard to login and start the course on the advertised course start date.

I need to cancel my registration. Are there any fees?
Cancellation requests must be submitted to MITxPRO@mit.edu. Cancellation requests received after October 27, 2017 will not be eligible for a refund. To submit your request, please include your full name and order number in your email request. Refunds will be credited to the credit card used when you registered and may take up to two billing cycles to process.

Can I transfer/defer my registration for another session or course?
Admission and fees paid cannot be deferred to a subsequent session; however, you may cancel your registration and reapply at a later date.

Can someone else attend in my place?
We cannot accommodate any substitution requests at this time. Please review the time commitment section and course schedule

COURSE QUESTIONS

How do I know if this course is right for me?
Carefully review the course description page, which includes a description of course content, objectives, and target audience, and any required prerequisites.

Are there prerequisites or advance reading materials?
The course is open to any interested participant. No advance reading is required. Ability to write code/programming experience is not a requirement. Since this is not an introductory course, the faculty strongly recommends participants to have substantial background knowledge of statistical techniques and data calculations or quantitative methods of data research.

Who will be participating in this course?
Professionals with diverse personal, business, and academic backgrounds from the U.S. and around the world will participate. They include scientists, engineers, technicians, managers, consultants, and others, and they come from industry, government, military, non-profit, and academia.

How long is the course?
The course is held over eight weeks with one holiday week. Lectures are pre-taped and you can follow along when you find it convenient, as long as you finish all required assignments by December 17, 2017. You may complete all assignments before the due date, however, you may find it more beneficial to adhere to a weekly schedule so you can stay up-to-date with the discussion forums.

What is the time commitment of this course?
MIT xPRO courses are designed to fit the schedules of busy professionals. Most participants will spend about 3 - 4 hours a week on course-related activities.

For participants that wish to engage with the optional case study activities, please allow an extra 4+ hours a week. These optional case study tutorials will require some prior knowledge and experience with the programming language you choose to use for reproducing case study results. Generally, participants with 6 months of experience using “R” or “Python” should be successful in going through these exercises. Please note that the most of the case study activities are not required and do not count towards your "grade" or earning a certificate of completion. However, there are two compulsory peer-reviewed case study activities that may take up to 90 minutes each to complete.

How long will the course material be available online?
The materials will be available to registered and paid participants until March 18, 2018. No extensions may be granted.

What reference materials will be available at the end of the course?
Participants will have 90-day access to the archived course (includes videos, discussion boards, content, and Wiki).

What materials will participants keep at the end of the course?
Participants will take away program materials, and resources presented in the course Wiki, including downloadable case study activities for you to work on in your spare time during or after the course.

Will I receive a Certificate?
Participants who successfully complete the course and all assessments will receive a Certificate in Data Science from MIT xPRO. This course does not carry MIT credits or grades, however, an 80% pass rate is required in order to receive the certificate.

Will I receive MIT credits?
This course does not carry MIT credits. MIT xPRO offers non-credit/non-degree professional programs for a global audience. Participants may not imply or state in any manner, written or oral, that MIT or MIT xPRO is granting academic credit for enrollment in this professional course. Letter grades are not awarded for this course.

Will I earn Continuing Education Units (CEUs)?
Course participants who successfully complete all course requirements are eligible to receive 1.8 Continuing Education Units (CEUs) from MIT xPRO. CEUs may not be applied toward any MIT undergraduate or graduate level course.

After I complete this course, will I be an MIT alum?
Participants who successfully complete this course are considered MIT xPRO Alumni. Only those who complete an undergraduate or graduate degree are considered MIT alumni.

Are video captions available?
Each video for this course has been transcribed and the text can be found on the right side of the video when the captions function is turned on. Synchronized transcripts allow students to follow along with the video and navigate to a specific section of the video by clicking the transcript text. Students can use transcripts of media-based learning materials for study and review. In addition, we include a complete course transcript in a single PDF file that allows for easy reference.

Browser/Technical Requirements
Access our courses requires an Internet connection, as videos are only available via online streaming, and cannot be downloaded for offline viewing. Please take note of your company's restrictions for viewing content and/or firewall settings. Our courseware works best with current versions of Google Chrome, Firefox, or Safari, or with Internet Explorer version 10 and above. For the best possible experience, we recommend switching to an up-to-date version of Chrome. If you do not have Chrome installed, you can get it for free here: http://www.google.com/chrome/browser/
We are unable to fully support access with mobile devices at this time. While many components of your courses will function on a mobile device, some may not.

I have never taken a course on the edX platform before. What can I do to prepare?
Prior to the first day of class, participants can take a demonstration course on edx.org that was built specifically to help students become more familiar with taking a course on the edX platform.

Enroll