Machine Learning: Clustering & Retrieval

开始时间: 04/22/2022 持续时间: Unknown

所在平台: CourseraArchive

课程类别: 计算机科学

大学或机构: CourseraNew




第一个写评论        关注课程


Case Studies: Finding Similar Documents A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce. Learning Outcomes: By the end of this course, you will be able to: -Create a document retrieval system using k-nearest neighbors. -Identify various similarity metrics for text data. -Reduce computations in k-nearest neighbor search by using KD-trees. -Produce approximate nearest neighbors using locality sensitive hashing. -Compare and contrast supervised and unsupervised learning tasks. -Cluster documents by topic using k-means. -Describe how to parallelize k-means using MapReduce. -Examine probabilistic clustering approaches using mixtures models. -Fit a mixture of Gaussian model using expectation maximization (EM). -Perform mixed membership modeling using latent Dirichlet allocation (LDA). -Describe the steps of a Gibbs sampler and how to use its output to draw inferences. -Compare and contrast initialization techniques for non-convex optimization objectives. -Implement these techniques in Python.

机器学习:群集和检索:案例研究:查找相似文档 读者对特定的新闻文章感兴趣,并且您想查找类似的文章进行推荐。正确的相似性概念是什么?此外,如果还有数百万其他文档怎么办?每次您想要检索一个新文档时,是否需要搜索所有其他文档?您如何将相似的文档分组在一起?您如何发现文档涵盖的新兴主题? 在第三个案例研究中,查找相似的文档,您将研究基于相似度的检索算法。在本课程中,您还将检查用于描述语料库中文档的结构化表示形式,包括聚类和混合成员模型,例如潜在的Dirichlet分配(LDA)。您将实现期望最大化(EM)以学习文档聚类,并了解如何使用MapReduce缩放方法。 学习成果:在本课程结束时,您将能够:    -使用k最近邻创建文档检索系统。    -确定文本数据的各种相似性指标。    -通过使用KD树减少k最近邻搜索中的计算。    -使用位置敏感的哈希生成近似最近的邻居。    -比较和对比有监督和无监督的学习任务。    -使用k均值按主题聚类文档。    -描述如何使用MapReduce并行化k均值。    -使用混合模型检查概率聚类方法。    -使用期望最大化(EM)拟合高斯模型的混合。    -使用潜在Dirichlet分配(LDA)执行混合成员资格建模。    -描述Gibbs采样器的步骤以及如何使用其输出进行推断。    -非凸优化目标的比较和对比初始化技术。    -在Python中实施这些技术。


Clustering and retrieval are some of the most high-impact machine learning tools out there. Retrieval is used in almost every applications and device we interact with, like in providing a set of products related to one a shopper is currently considering, or a list of people you might want to connect with on a social media platform. Clustering can be used to aid retrieval, but is a more broadly useful tool for automatically discovering structure in data, like uncovering groups of similar patients.

This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have.



Case Studies: Finding Similar Documents A reader is interested in a specific news article and you w


机器学习 聚类 检索 相似度检索 机器学习课程 机器学习专项课程 华盛顿大学 华盛顿大学机器学习