+91 9873530045
admin@learnwithfrahimcom
Mon - Sat : 09 AM - 09 PM

02.1 - Cheatsheet

Lesson 2.1: Data Science Cheatsheet

Lesson 2.1: Data Science Cheatsheet

This cheatsheet provides a quick reference for the most important commands and functions from popular Python libraries used in data science. Bookmark this page as your go-to reference!

NumPy

CommandDescription
np.array([1,2,3])Create NumPy array
np.zeros((3,3))3x3 array of zeros
np.ones((2,2))2x2 array of ones
np.arange(0,10,2)Array of numbers with step size
np.linspace(0,1,5)5 evenly spaced numbers between 0–1
arr.shapeShape of array
arr.reshape(2,3)Reshape array
np.mean(arr)Mean
np.std(arr)Standard deviation
np.dot(a,b)Dot product
np.random.rand(3,2)Random numbers (uniform)
np.random.randn(3)Random numbers (normal distribution)
Tips: Use NumPy arrays instead of Python lists for fast mathematical operations. Prefer vectorization over loops.

Pandas

CommandDescription
pd.read_csv("file.csv")Read CSV file
df.head()First 5 rows
df.info()Summary of DataFrame
df.describe()Descriptive statistics
df["col"]Access a column
df.loc[0]Access row by label
df.iloc[0]Access row by index
df.dropna()Remove missing values
df.fillna(0)Fill missing values
df.duplicated()Check duplicates
df.drop_duplicates()Remove duplicates
df.groupby("col").mean()Group by + aggregate
df.sort_values("col")Sort by column
df.to_csv("out.csv")Export to CSV
Tips: Use .loc for label-based indexing and .iloc for position-based indexing. Always check df.info() before processing.

Matplotlib

CommandDescription
plt.plot(x,y)Line plot
plt.scatter(x,y)Scatter plot
plt.bar(x,y)Bar chart
plt.hist(data)Histogram
plt.xlabel("x")Label x-axis
plt.ylabel("y")Label y-axis
plt.title("Title")Set plot title
plt.legend()Show legend
plt.show()Display plot
Tips: Use plt.figure(figsize=(w,h)) for better sizing. Combine with Seaborn for prettier plots.

Seaborn

CommandDescription
sns.countplot(x="col", data=df)Count plot
sns.histplot(df["col"])Histogram
sns.boxplot(x="col", y="val", data=df)Boxplot
sns.heatmap(df.corr(), annot=True)Heatmap of correlations
sns.pairplot(df)Pairwise relationships
Tips: Seaborn integrates well with Pandas DataFrames. Use it for quick, attractive visualizations.

Scikit-learn

CommandDescription
from sklearn.model_selection import train_test_splitSplit data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)80/20 split
from sklearn.preprocessing import StandardScalerImport scaler
scaler.fit_transform(X)Scale data
from sklearn.linear_model import LogisticRegressionImport logistic regression
model.fit(X_train, y_train)Train model
y_pred = model.predict(X_test)Predict
from sklearn.metrics import accuracy_scoreAccuracy metric
from sklearn.metrics import classification_reportPrecision, Recall, F1
from sklearn.metrics import confusion_matrixConfusion matrix
Tips: Always scale numerical data before training models sensitive to feature magnitude (e.g., SVM, Logistic Regression). Use train_test_split with a random seed for reproducibility.