Mining Massive Datasets

开始时间: 04/22/2022 持续时间: 7 weeks

所在平台: CourseraArchive

课程类别: 计算机科学

大学或机构: Stanford University(斯坦福大学)

授课老师: Jeff Ullman Anand Rajaraman Jure Leskovec


课程评论: 1 个评论

评论课程        关注课程


We introduce the student to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general.  The rest of the course is devoted to algorithms for extracting models and information from large datasets.  Students will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes.  We'll cover locality-sensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair.  When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we'll talk about efficient approaches.  Many other large-scale algorithms are covered as well, as outlined in the course syllabus.


Week 1:

Week 2
Locality-Sensitive Hashing
Nearest Neighbors
Decision Trees

Week 3
Frequent Itemsets
Analysis of large graphs

Week 4
Recommender systems
Data streams

Week 5
Distance measures
Dimensionality reduction

Week 6
Support-Vector machines
More about MapReduce

Week 7
More about PageRank
More about Locality-Sensitive hashing
On-line algorithms



skyline打酱油 2015-03-20 14:50 0 票支持; 0 票反对



This class teaches algorithms for extracting models and other information from very large amounts of data. The emphasis is on techniques that are efficient and that scale well.


大数据 MapReduce 支持向量机 SVM PageRank 推荐系统 决策树