Differences between Sobol and SHAP Sensitivity Analysis on Housing Prices Predictions

Renee LIN
5 min readMar 5, 2023

I recently applied both Sobol and SHAP to my model to get the most impactful feature, because I thought using two methods could be more convincing. However, I got inconsistent importance scores from the two methods. Consequently, I have to figure out what is wrong; life is hard.

  1. Test on a simple dataset(housing prices) to check if I use the methods correctly
  2. Searching for theoretical explanation according to the result
  3. Back to my problem/dataset/model to examine the differences

1. Test on a simple dataset

(1) A simple housing price prediction model

Dataset

I used the California Housing Prices dataset, since the classical Boston Housing dataset is unethical, and the California dataset is suggested in the below error notice.

This dataset was derived from the 1990 U.S. census, using one row per census group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

There are eight features, and the price is expressed in million.

# Import dataset…

--

--

Renee LIN

Passionate about web dev and data analysis. Huge FFXIV fan. Interested in healthcare data now.