Introduction to Big Data with Apache Spark

开始时间: 04/22/2022 持续时间: 5 weeks

所在平台: EdxArchive

课程类别: 计算机科学

大学或机构: UC BerkeleyX(加州大学伯克利分校)

授课老师: Anthony D. Joseph

课程主页: https://www.edx.org/archive/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x

课程评论: 1 个评论

评论课程        关注课程

课程详情

*Note - This is an Archived course*

Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (part of Apache Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.


This is a past/archived course. At this time, you can only explore this course in a self-paced fashion. Certain features of this course may not be active, but many people enjoy watching the videos and working with the materials. Make sure to check for reruns of this course.
 

课程大纲

  • Learn how to use Apache Spark to perform data analysis
  • How to use parallel programming to explore data sets
  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
  • Prepare for the Spark Certified Developer exam

课程评论(1条)

1

Monkey_D_Law 2015-07-10 16:11 1 票支持; 0 票反对

spark零基础的小白表示,课程很有意思,实例很多,不懂的地方可能要翻翻书。

作业.ipynb文件做的相当用心,图文并茂,还有代码实现,简直完爆市面上spark参考书。

视频量偏少,前两周视频相当给力,小白听的津津有味,后两周视频有点坑爹,多是介绍性知识,快进看完。作业挺难,但是很有收获,对python要求较高,特别是lab3的作业,比较考验python功底,后面几次作业就主要是查API了,最后几天,一天补齐一个lab,这酸爽。。。

PS:逛论坛的时候看到instructor说他们学校的学生,平均做作业的时间是2—4小时,当时觉得很羞愧。也说明,国外学生的训练强度确实大。

课程简介

Learn how to apply data science techniques using parallel programming in Apache Spark to explore big (and small) data.

课程标签

Spark 大数据

69人关注该课程

主题相关的课程