The original data is collected in Pace and Barry [1997]

Modifications were made by Géron [2019]

made available under the CC0 license on Kaggle, with the following modifications from the original:

  • 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data.

  • An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data.


Aurélien Géron. Hands-on machine learning with scikit-learn, Keras, and tensorflow: Concepts, tools, and techniques to build Intelligent Systems. O'Reilly, 2019.


R. Kelley Pace and Ronald Barry. Sparse spatial autoregressions. Statistics &amp$\mathsemicolon $ Probability Letters, 33(3):291–297, May 1997. URL:, doi:10.1016/s0167-7152(96)00140-x.