Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud

开始时间: 08/08/2020 持续时间: Unknown

所在平台: Coursera

课程类别: 其他类别

大学或机构: CourseraNew



Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.


第一个写评论        关注课程


Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information. We start the first week by introducing some major systems for data analysis including Spark and the major frameworks and distributions of analytics applications including Hortonworks, Cloudera, and MapR. By the middle of week one we introduce the HDFS distributed and robust file system that is used in many applications like Hadoop and finish week one by exploring the powerful MapReduce programming model and how distributed operating systems like YARN and Mesos support a flexible and scalable environment for Big Data analytics. In week two, our course introduces large scale data storage and the difficulties and problems of consensus in enormous stores that use quantities of processors, memories and disks. We discuss eventual consistency, ACID, and BASE and the consensus algorithms used in data centers including Paxos and Zookeeper. Our course presents Distributed Key-Value Stores and in memory databases like Redis used in data centers for performance. Next we present NOSQL Databases. We visit HBase, the scalable, low latency database that supports database operations in applications that use Hadoop. Then again we show how Spark SQL can program SQL queries on huge data. We finish up week two with a presentation on Distributed Publish/Subscribe systems using Kafka, a distributed log messaging system that is finding wide use in connecting Big Data and streaming applications together to form complex systems. Week three moves to fast data real-time streaming and introduces Storm technology that is used widely in industries such as Yahoo. We continue with Spark Streaming, Lambda and Kappa architectures, and a presentation of the Streaming Ecosystem. Week four focuses on Graph Processing, Machine Learning, and Deep Learning. We introduce the ideas of graph processing and present Pregel, Giraph, and Spark GraphX. Then we move to machine learning with examples from Mahout and Spark. Kmeans, Naive Bayes, and fpm are given as examples. Spark ML and Mllib continue the theme of programmability and application construction. The last topic we cover in week four introduces Deep Learning technologies including Theano, Tensor Flow, CNTK, MXnet, and Caffe on Spark.

云计算应用程序,第2部分:云中的大数据和应用程序:欢迎参加“云计算应用程序”课程,这是两门课程系列的第二部分,旨在向您全面介绍云计算和大数据的世界! 在第二个课程中,我们将继续探索云如何开放云对静态或高速流传输的海量数据的数据分析的能力,从而继续发展云计算应用程序。云应用程序和数据分析代表着社会了解和使用信息方式的颠覆性变化。第一周开始,我们将介绍一些主要的数据分析系统,包括Spark,以及主要的框架和分析应用程序的发行版,包括Hortonworks,Cloudera和MapR。在第一周中旬,我们介绍了适用于Hadoop等许多应用程序的HDFS分布式健壮文件系统,并通过研究功能强大的MapReduce编程模型以及YARN和Mesos等分布式操作系统如何支持灵活的可扩展环境来完成第一周。大数据分析。在第二周,我们的课程介绍了大规模数据存储以及在使用大量处理器,内存和磁盘的大型存储中达成共识的困难和问题。我们讨论了最终的一致性,ACID和BASE以及数据中心(包括Paxos和Zookeeper)中使用的共识算法。我们的课程介绍分布式键值存储以及数据中心中用于性能的Redis之类的内存数据库。接下来,我们介绍NOSQL数据库。我们访问HBase,这是一种可扩展的低延迟数据库,它支持使用Hadoop的应用程序中的数据库操作。然后,我们再次展示Spark SQL如何对大量数据进行SQL查询编程。在第二周结束时,我们将进行有关使用Kafka的分布式发布/订阅系统的演示,Kafka是一种分布式日志消息系统,在将大数据和流应用程序连接在一起以形成复杂的系统中得到了广泛的应用。第三周转向快速数据实时流传输,并介绍了在Yahoo等行业中广泛使用的Storm技术。我们将继续介绍Spark Streaming,Lambda和Kappa体系结构,以及Streaming生态系统的介绍。第四周的重点是图处理,机器学习和深度学习。我们介绍图形处理的思想,并介绍Pregel,Giraph和Spark GraphX。然后,我们使用Mahout和Spark的示例进行机器学习。以Kmeans,朴素贝叶斯和fpm为例。 Spark ML和Mllib继续以可编程性和应用程序构建为主题。我们在第四周讨论的最后一个主题介绍了深度学习技术,包括Theano,Tensor Flow,CNTK,MXnet和Caffe on Spark。


In Module 1, we introduce you to the world of Big Data applications. We start by introducing you to Apache Spark, a common framework used for many different tasks throughout the course. We then introduce some Big Data distro packages, the HDFS file system, and finally the idea of batch-based Big Data processing using the MapReduce programming paradigm.



Welcome to the Cloud Computing Applications course, the second part of a two-course series designed