Text Retrieval and Search Engines

开始时间: 08/08/2020 持续时间: Unknown

所在平台: Coursera

课程类别: 计算机科学

大学或机构: CourseraNew


课程主页: https://www.coursera.org/learn/text-retrieval

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.


第一个写评论        关注课程


Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines.

文本检索和搜索引擎:近年来,自然语言文本数据急剧增长,包括网页,新闻文章,科学文献,电子邮件,企业文档以及社交媒体,例如博客文章,论坛帖子,产品评论和推文。 。文本数据的独特之处在于它们通常是由人类而不是计算机系统或传感器直接生成的,因此,除了发现我们在文本中编码的许多其他种类的知识外,它们对于发现有关人们观点和偏好的知识特别有价值。 本课程将涵盖搜索引擎技术,由于两个原因,它们在涉及文本数据的任何数据挖掘应用程序中都起着重要作用。首先,虽然原始数据对于任何特定问题可能很大,但通常是相关数据的相对较小的子集,而搜索引擎是快速发现大文本中相关文本数据的一小部分的必要工具采集。其次,需要搜索引擎来帮助分析师解释数据中发现的任何模式,方法是允许他们检查相关的原始文本数据以理解任何发现的模式。您将学习文本检索的基本概念,原理和主要技术,这是搜索引擎的基础科学。


In this week's lessons, you will learn how the vector space model works in detail, the major heuristics used in designing a retrieval function for ranking documents with respect to a query, and how to implement an information retrieval system (i.e., a search engine), including how to build an inverted index and how to score documents quickly for a query.



Recent years have seen a dramatic growth of natural language text data, including web pages, news ar


信息检索 文本检索 搜索引擎原理 搜索引擎