This cheatsheet provides a quick reference for the most important commands and functions from popular Python libraries used in data science. Bookmark this page as your go-to reference!
| Command | Description |
|---|---|
| np.array([1,2,3]) | Create NumPy array |
| np.zeros((3,3)) | 3x3 array of zeros |
| np.ones((2,2)) | 2x2 array of ones |
| np.arange(0,10,2) | Array of numbers with step size |
| np.linspace(0,1,5) | 5 evenly spaced numbers between 0–1 |
| arr.shape | Shape of array |
| arr.reshape(2,3) | Reshape array |
| np.mean(arr) | Mean |
| np.std(arr) | Standard deviation |
| np.dot(a,b) | Dot product |
| np.random.rand(3,2) | Random numbers (uniform) |
| np.random.randn(3) | Random numbers (normal distribution) |
| Command | Description |
|---|---|
| pd.read_csv("file.csv") | Read CSV file |
| df.head() | First 5 rows |
| df.info() | Summary of DataFrame |
| df.describe() | Descriptive statistics |
| df["col"] | Access a column |
| df.loc[0] | Access row by label |
| df.iloc[0] | Access row by index |
| df.dropna() | Remove missing values |
| df.fillna(0) | Fill missing values |
| df.duplicated() | Check duplicates |
| df.drop_duplicates() | Remove duplicates |
| df.groupby("col").mean() | Group by + aggregate |
| df.sort_values("col") | Sort by column |
| df.to_csv("out.csv") | Export to CSV |
.loc for label-based indexing and .iloc for position-based indexing. Always check df.info() before processing.
| Command | Description |
|---|---|
| plt.plot(x,y) | Line plot |
| plt.scatter(x,y) | Scatter plot |
| plt.bar(x,y) | Bar chart |
| plt.hist(data) | Histogram |
| plt.xlabel("x") | Label x-axis |
| plt.ylabel("y") | Label y-axis |
| plt.title("Title") | Set plot title |
| plt.legend() | Show legend |
| plt.show() | Display plot |
plt.figure(figsize=(w,h)) for better sizing. Combine with Seaborn for prettier plots.
| Command | Description |
|---|---|
| sns.countplot(x="col", data=df) | Count plot |
| sns.histplot(df["col"]) | Histogram |
| sns.boxplot(x="col", y="val", data=df) | Boxplot |
| sns.heatmap(df.corr(), annot=True) | Heatmap of correlations |
| sns.pairplot(df) | Pairwise relationships |
| Command | Description |
|---|---|
| from sklearn.model_selection import train_test_split | Split data |
| X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) | 80/20 split |
| from sklearn.preprocessing import StandardScaler | Import scaler |
| scaler.fit_transform(X) | Scale data |
| from sklearn.linear_model import LogisticRegression | Import logistic regression |
| model.fit(X_train, y_train) | Train model |
| y_pred = model.predict(X_test) | Predict |
| from sklearn.metrics import accuracy_score | Accuracy metric |
| from sklearn.metrics import classification_report | Precision, Recall, F1 |
| from sklearn.metrics import confusion_matrix | Confusion matrix |
train_test_split with a random seed for reproducibility.