Heatmap is pretty helpful to show the relationship between columns of data, which is a correlation. It is also helpful if you want to plot three-dimensional data(e.g. x is time, y is location, z is the sales). As common libraries like Seaborn produce static pictures which are not interactive, I’d like to create Heatmap using Plotly. I will use the data from the Titanic dataset to visualize the correlations of the features.
- Load Data and Clean Data
- Plot the Heatmap
- Beautify the Heatmap
1. Load Data and Clean Data
The data can be downloaded on the Kaggle Titanic competition page
import pandas as pdtrain = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')train.head(3)
test.head(3)# fill missing values
train["Age"] = train["Age"].fillna(train["Age"].mean())
test["Age"] = test["Age"].fillna(test["Age"].mean())
test["Fare"] = test["Fare"].fillna(test["Fare"].mean())
train["Cabin"] = train["Cabin"].fillna(train["Cabin"].mode()[0])
test["Cabin"] = test["Cabin"].fillna(test["Cabin"].mode()[0])
train["Embarked"] = train["Embarked"].fillna(train["Embarked"].mode()[0])# category feature to numerical
train["Sex"] = [1 if i == "male" else 0 for i in train["Sex"]]
test["Sex"] = [1 if i == "male" else 0 for i in test["Sex"]]
train["Embarked"] = [0 if i == "S" else i for i in train["Embarked"]]
train["Embarked"] = [1 if i == "C" else i for i in train["Embarked"]]
train["Embarked"] = [2 if i == "Q" else i for i in train["Embarked"]]
test["Embarked"] = [0 if i == "S" else i for i in test["Embarked"]]
test["Embarked"] = [1 if i == "C" else i for i in test["Embarked"]]
test["Embarked"] = [2 if i == "Q" else i for i in test["Embarked"]]
2. Plot the Heatmap
We have to create the correlation (value for z-axis)first.
cols = ["Survived","Pclass","Sex","Age","SibSp","Parch","Fare","Embarked"]
train_corr = train[cols].corr()
train_corr