Exploratory Data Analysis#

Exploratory Data Analysis (EDA) is essential in any data science project. The primary objective is to explore and understand the dataset at hand.

In this process, data analysts use various statistical and visualization techniques to uncover patterns, identify trends, and extract valuable insights from the data. EDA enables analysts to deeply understand the data, including its distribution, correlation, and other relevant characteristics. It also helps them identify potential outliers, missing values, and other anomalies that could affect the quality of the analysis.

EDA is a crucial step that helps data scientists lay the groundwork for building predictive models and making informed business decisions.

How To#

import pandas as pd
df = pd.read_csv("data/housing.csv")
df.head()
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value ocean_proximity
0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0 NEAR BAY
1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0 NEAR BAY
2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0 NEAR BAY
3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0 NEAR BAY
4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0 NEAR BAY
from ydata_profiling import ProfileReport
profile = ProfileReport(df)
profile

profile.to_widgets()
profile = ProfileReport(df, explorative=True)
profile.to_widgets()
profile.to_file("pandas_profile.html")