Big Data Analysis with Spark

开始时间: 04/22/2022 持续时间: 4 weeks

所在平台: EdxArchive

课程类别: 其他类别

大学或机构: UC BerkeleyX（加州大学伯克利分校）

授课老师： Jon Bates Anthony D. Joseph

课程主页: https://www.edx.org/archive/big-data-analysis-spark-uc-berkeleyx-cs110x

课程评论：没有评论

第一个写评论关注课程

课程详情

Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’.

This statistics and data analysis course will attempt to articulate the expected output of data scientists and then teach students how to use PySpark (part of Spark) to deliver against these expectations. The course assignments include log mining, textual entity recognition, and collaborative filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), and previous experience with Spark equivalent to Introduction to Spark, is required.

课程大纲

How to use Apache Spark to perform data analysis
How to use parallel programming to explore data sets
Apply log mining, textual entity recognition and collaborative filtering techniques to real-world data questions

课程评论(0条)

课程简介

Learn how to apply data science techniques using parallel programming in Spark to explore big data.