Creating your own cheat sheets to organise your data analysis workflow (Pandas cheat sheet)

Renee LIN
2 min readJun 9, 2022

If you have used SQL, Numpy, Pandas and Excel, you would notice the logic of processing data is similar. We inspect, clean, sort, filter and aggregate data based on business logic. You might also notice that you can’t remember all the specific commands/syntax for different processings. I’d like to list the most common ones to remind myself of all the core data manipulating steps and create a unified cheat sheet in one place so I don’t have to search at Google every time I want to use specific methods.

I hope it can be an inspiratation for creating your own cheat sheet for any library/framework you use frequently.

1. Importing/Exporting Data

pd.read_csv(filename)
pd.read_excel(filename)
pd.read_sql(query, connection_object)
df.to_csv(filename)
df.to_excel(filename)
df.to_sql(table_name, connection_object)
df.to_json(filename)

2. Viewing/Inspecting Data

df.head(n)
df.shape
df.describe() #statistics for numerical columns
df.mean()
df.corr() #correlation between columns in a DataFrame
df.count()
df.max()
df.min()
df.median()
df.std() #standard deviation of each column

3. Cleaning Data

df.columns = ['a','b','c'] | #rename columns
pd.isnull()
pd.notnull()
df.dropna()
df.fillna(x) #replace all null values with x
series.astype(float) #convert the datatype
series.replace('one',1) #replace all values of 'one' to 1
df.set_index('column_one') #hange the index

4. Selecting Data

df[col]
df[[col1, col2]]
df.iloc[0,:] # First row
df.iloc[0,0] # First element of first column
df.at[7, 'Product_Name'] = 'Test Product' # assign new value
#conditional selection
df[df[1:4] > 0]
df.loc[df['xxx'] > 30, ['a', 'b']]

5. Filter, Sort, and Groupby

df[df[col] > 0.5] | 
df[(df[col] > 0.5) & (df[col] < 0.7)] # don't forget the ()
df.sort_values(col1)

df.groupby([col1,col2])
df.groupby(col1).agg(np.mean) # average across all columns

6. Join/Merge

Renee LIN

Passionate about web dev and data analysis. Huge FFXIV fan. Aiming to work with healthcare data for a living in 2024.