Distributed Machine Learning with Spark

开始时间: 04/22/2022 持续时间: 4 weeks

所在平台: EdxArchive

课程类别: 其他类别

大学或机构: UC BerkeleyX（加州大学伯克利分校）

授课老师： Jon Bates Ameet Talwalkar

课程主页: https://www.edx.org/archive/distributed-machine-learning-spark-uc-berkeleyx-cs120x

课程评论：没有评论

第一个写评论关注课程

课程详情

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

课程大纲

The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines
Exploratory data analysis, feature extraction, supervised learning, and model evaluation
Application of these principles using Spark
How to implement distributed algorithms for fundamental statistical models

课程评论(0条)

课程简介

Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Spark.