Basics of data visualization#

We used data visualizations throughout this course. Here, we explore how to modify and enhance these figures.

Data visualization presents information and data using visual elements such as charts, graphs, and maps.

It is a powerful tool that can help us understand complex data sets quickly and efficiently, enabling us to make better-informed decisions. In this notebook, we will delve into the basics of data visualization, including the types of visualizations commonly used, their benefits and limitations, and some tips and tricks to enhance and modify these figures to better communicate our insights.

Whether you are a data analyst, researcher, or simply someone interested in exploring data, this notebook will provide the necessary foundational knowledge to create effective and impactful data visualizations.

How To#

import pandas as pd

df = pd.read_csv("data/housing.csv")
df.head()
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value ocean_proximity
0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0 NEAR BAY
1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0 NEAR BAY
2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0 NEAR BAY
3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0 NEAR BAY
4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0 NEAR BAY
import matplotlib.pyplot as plt
plt.figure(figsize=(10,2))
plt.plot(df.population, df.median_house_value, 'ro', label="House")
plt.plot(df.population, df.median_house_value, 'kx', label="House 2")
plt.title("Value by Population")
plt.xlabel("Population")
plt.ylabel("House Value")
plt.legend()
plt.show()
# plt.savefig("figure.png", dpi=300)
../_images/e4d289a106e007739e2ece87a3694628db4370d96a59bb9065b8e61b3f50f7c1.png

Image Data#

plt.imshow()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 plt.imshow()

TypeError: imshow() missing 1 required positional argument: 'X'

Seaborn#

import seaborn as sns
plt.figure(figsize=(10,10))
sns.pairplot(df.sample(100))
plt.savefig("seaborn.png")
<Figure size 1000x1000 with 0 Axes>
../_images/703b87ccbbb5e7e18f8cb151ffc04313161d3144249004d2a2b9060a3adac102.png

Pandas#

df.plot("population", "total_rooms", "scatter")
<Axes: xlabel='population', ylabel='total_rooms'>
../_images/444d8ac0cc05f48500f64c8e70aea5055ed728cb6d8a8184b7fe803a466ecd3f.png

Exercise#

Experiment with the plotting capabilities in Matplotlib and Seaborn

Additional Resources#