Learn With Frahim

02.1 - Cheatsheet

Lesson 2.1: Data Science Cheatsheet

This cheatsheet provides a quick reference for the most important commands and functions from popular Python libraries used in data science. Bookmark this page as your go-to reference!

NumPy

Command	Description
np.array([1,2,3])	Create NumPy array
np.zeros((3,3))	3x3 array of zeros
np.ones((2,2))	2x2 array of ones
np.arange(0,10,2)	Array of numbers with step size
np.linspace(0,1,5)	5 evenly spaced numbers between 0–1
arr.shape	Shape of array
arr.reshape(2,3)	Reshape array
np.mean(arr)	Mean
np.std(arr)	Standard deviation
np.dot(a,b)	Dot product
np.random.rand(3,2)	Random numbers (uniform)
np.random.randn(3)	Random numbers (normal distribution)

Tips: Use NumPy arrays instead of Python lists for fast mathematical operations. Prefer vectorization over loops.

Pandas

Command	Description
pd.read_csv("file.csv")	Read CSV file
df.head()	First 5 rows
df.info()	Summary of DataFrame
df.describe()	Descriptive statistics
df["col"]	Access a column
df.loc[0]	Access row by label
df.iloc[0]	Access row by index
df.dropna()	Remove missing values
df.fillna(0)	Fill missing values
df.duplicated()	Check duplicates
df.drop_duplicates()	Remove duplicates
df.groupby("col").mean()	Group by + aggregate
df.sort_values("col")	Sort by column
df.to_csv("out.csv")	Export to CSV

Tips: Use .loc for label-based indexing and .iloc for position-based indexing. Always check df.info() before processing.

Matplotlib

Command	Description
plt.plot(x,y)	Line plot
plt.scatter(x,y)	Scatter plot
plt.bar(x,y)	Bar chart
plt.hist(data)	Histogram
plt.xlabel("x")	Label x-axis
plt.ylabel("y")	Label y-axis
plt.title("Title")	Set plot title
plt.legend()	Show legend
plt.show()	Display plot

Tips: Use plt.figure(figsize=(w,h)) for better sizing. Combine with Seaborn for prettier plots.

Seaborn

Command	Description
sns.countplot(x="col", data=df)	Count plot
sns.histplot(df["col"])	Histogram
sns.boxplot(x="col", y="val", data=df)	Boxplot
sns.heatmap(df.corr(), annot=True)	Heatmap of correlations
sns.pairplot(df)	Pairwise relationships

Tips: Seaborn integrates well with Pandas DataFrames. Use it for quick, attractive visualizations.

Scikit-learn

Command	Description
from sklearn.model_selection import train_test_split	Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)	80/20 split
from sklearn.preprocessing import StandardScaler	Import scaler
scaler.fit_transform(X)	Scale data
from sklearn.linear_model import LogisticRegression	Import logistic regression
model.fit(X_train, y_train)	Train model
y_pred = model.predict(X_test)	Predict
from sklearn.metrics import accuracy_score	Accuracy metric
from sklearn.metrics import classification_report	Precision, Recall, F1
from sklearn.metrics import confusion_matrix	Confusion matrix

Tips: Always scale numerical data before training models sensitive to feature magnitude (e.g., SVM, Logistic Regression). Use train_test_split with a random seed for reproducibility.

← Previous: 02 - Python for Data Science - Data Preprocessing ← Previous: Next: 03 - Feature Engineering → Next: →