Scalable Machine Learning

开始时间: 04/22/2022 持续时间: 5 weeks

所在平台: EdxArchive

课程类别: 计算机科学

大学或机构: UC BerkeleyX（加州大学伯克利分校）

授课老师： Ameet Talwalkar

课程主页: https://www.edx.org/archive/scalable-machine-learning-uc-berkeleyx-cs190-1x

课程评论: 1 个评论

评论课程关注课程

课程详情

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘Big Data,’ with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Apache Spark, a cluster computing system well-suited for large-scale machine learning tasks. You will implement scalable algorithms for fundamental statistical models (linear regression, logistic regression, matrix factorization, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

This self-assessment document provides a short quiz, as well as online resources that review the relevant background material.

课程大纲

The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines
Exploratory data analysis, feature extraction, supervised learning, and model evaluation
Application of these principles using Apache Spark
How to implement scalable algorithms for fundamental statistical models

课程评论(1条)

Monkey_D_Law 2015-08-19 09:47 0 票支持; 0 票反对

这门课作为Introduction to Big Data with Apache Spark的后一门课，惊喜感不大。有个week内容完全重复，后面讲解了一些机器学习的内容，讲的还不错。感觉这两门课完全可以合在一起。

当然了，亮点也不少，比如介绍了numpy，我平时用过不多趁这个机会好好研究了下。LR，Logistic LR，PCA都讲的不错，很系统。作业依旧挺难，设计的很赞。

上完这两门课，确实是很有收获，但是同时也感觉很虚，好像什么收获都没有。可能作业的时候，都专心去解决python小问题了，没有大局观。要把作业的代码完整的啃一遍，然后用到实际，应该会更好。

课程简介

Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Apache Spark.

课程标签

机器学习 Spark 大规模机器学习