Exploratory Data Analysis#

Exploratory Data Analysis (EDA) is essential in any data science project. The primary objective is to explore and understand the dataset at hand.

In this process, data analysts use various statistical and visualization techniques to uncover patterns, identify trends, and extract valuable insights from the data. EDA enables analysts to deeply understand the data, including its distribution, correlation, and other relevant characteristics. It also helps them identify potential outliers, missing values, and other anomalies that could affect the quality of the analysis.

EDA is a crucial step that helps data scientists lay the groundwork for building predictive models and making informed business decisions.

How To#

import pandas as pd
df = pd.read_csv("data/housing.csv")
df.head()
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value ocean_proximity
0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0 NEAR BAY
1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0 NEAR BAY
2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0 NEAR BAY
3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0 NEAR BAY
4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0 NEAR BAY
from pandas_profiling import ProfileReport
---------------------------------------------------------------------------
PydanticImportError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 from pandas_profiling import ProfileReport

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas_profiling/__init__.py:8
      1 """Main module of pandas-profiling.
      2 
      3 .. include:: ../../README.md
      4 """
      6 from warnings import warn
----> 8 from pandas_profiling.compare_reports import compare
      9 from pandas_profiling.controller import pandas_decorator
     10 from pandas_profiling.profile_report import ProfileReport

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas_profiling/compare_reports.py:6
      2 from typing import Any, List, Optional, Tuple, Union
      4 import pandas as pd
----> 6 from pandas_profiling.config import Correlation, Settings
      7 from pandas_profiling.model.alerts import Alert
      8 from pandas_profiling.profile_report import ProfileReport

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas_profiling/config.py:7
      4 from typing import Any, Dict, List, Optional, Tuple, Union
      6 import yaml
----> 7 from pydantic import BaseModel, BaseSettings, Field, PrivateAttr
     10 def _merge_dictionaries(dict1: dict, dict2: dict) -> dict:
     11     """
     12     Recursive merge dictionaries.
     13 
   (...)
     16     :return: Merged dictionary
     17     """

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pydantic/__init__.py:363, in __getattr__(attr_name)
    361 dynamic_attr = _dynamic_imports.get(attr_name)
    362 if dynamic_attr is None:
--> 363     return _getattr_migration(attr_name)
    365 package, module_name = dynamic_attr
    367 from importlib import import_module

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pydantic/_migration.py:296, in getattr_migration.<locals>.wrapper(name)
    294     return import_string(REDIRECT_TO_V1[import_path])
    295 if import_path == 'pydantic:BaseSettings':
--> 296     raise PydanticImportError(
    297         '`BaseSettings` has been moved to the `pydantic-settings` package. '
    298         f'See https://docs.pydantic.dev/{version_short()}/migration/#basesettings-has-moved-to-pydantic-settings '
    299         'for more details.'
    300     )
    301 if import_path in REMOVED_IN_V2:
    302     raise PydanticImportError(f'`{import_path}` has been removed in V2.')

PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package. See https://docs.pydantic.dev/2.5/migration/#basesettings-has-moved-to-pydantic-settings for more details.

For further information visit https://errors.pydantic.dev/2.5/u/import-error
profile = ProfileReport(df)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 profile = ProfileReport(df)

NameError: name 'ProfileReport' is not defined
profile
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 profile

NameError: name 'profile' is not defined
profile.to_widgets()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 profile.to_widgets()

NameError: name 'profile' is not defined
profile = ProfileReport(df, explorative=True)
profile.to_widgets()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 profile = ProfileReport(df, explorative=True)
      2 profile.to_widgets()

NameError: name 'ProfileReport' is not defined
profile.to_file("pandas_profile.html")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 profile.to_file("pandas_profile.html")

NameError: name 'profile' is not defined