Introduction to Big Data with Apache Spark

开始时间: 待定 持续时间: 5 weeks

所在平台: edX

课程类别: 计算机科学

大学或机构: UC BerkeleyX(加州大学伯克利分校)

授课老师: Anthony D. Joseph



Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

课程评论: 1 个评论

评论课程        关注课程


*Note - This is an Archived course*

Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (part of Apache Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.

This is a past/archived course. At this time, you can only explore this course in a self-paced fashion. Certain features of this course may not be active, but many people enjoy watching the videos and working with the materials. Make sure to check for reruns of this course.


  • Learn how to use Apache Spark to perform data analysis
  • How to use parallel programming to explore data sets
  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
  • Prepare for the Spark Certified Developer exam



Monkey_D_Law 2015-07-10 16:11 1 票支持; 0 票反对





Step forward in 2017: Build in-demand career skills with Coursera


Learn how to apply data science techniques using parallel programming in Apache Spark to explore big (and small) data.


Spark 大数据