开始时间: 随时 持续时间: 自主
Lesson 1: What is EDA?
Learn about exploratory data analysis (EDA) and its importance, and find out about the course structure and final project.
Lesson 2: Intro to R
EDA, which comes before formal hypothesis testing and modeling, often uses visual methods to analyze and summarize data sets, and R will be our tool for generating those visuals and conducting analyses. In this lesson, we will install RStudio and packages, learn the layout and basic commands of R, practice writing basic R scripts, and inspect data sets.
Lesson 3: Exploring One Variable
We perform EDA to understand the distribution of a variable and to check for anomalies and outliers. Learn how to quantify and visualize individual variables within a data set as we begin to make sense of the diamond data set. We will create histograms and boxplots, transform variables, and examine tradeoffs in visualizations.
Lesson 4: Exploring the Relationship of Two Variables
EDA allows us to identify the most important variables and relationships within a data set before building predictive models. In this lesson, we will learn techniques for exploring the relationship between any two variables in a data set.
Lesson 5: Exploring Multiple Variables
Data sets can be complex. In this lesson, we will learn powerful methods and visualizations for examining relationships among multiple variables. We’ll extend our knowledge of previous graphics as we continue to build intuition around the diamond data set.
Lesson 6: Exploring Data Sets
Learn about current tools for EDA, and investigate data alongside an expert. As a final project, you will create your own exploratory data analysis.
Exploratory Data Analysis (EDA) is an approach to data analysis for summarizing and visualizing the important characteristics of a data set. Promoted by John Tukey, EDA focuses on exploring data to understand the data’s underlying structure and variables, to develop intuition about the data set, consider how that date set came into existence, and decide how it can be investigated with more formal statistical methods.