Learn With Frahim

Day 5: Python File Handling & Web Scraping

Step 1: Understanding File Types

.py – Python code file, executable in Python interpreter.
.txt – Plain text file for storing data or notes.
.csv – Comma-separated values, useful for tables or spreadsheets.
requirements.txt – List of Python packages for your project.

Step 2: Reading a Text File

Assume the file data.txt is saved in C:/Python Tutorial/Day 5/. Open it using open():

'r' – Read (default), file must exist.
'w' – Write, creates or overwrites file.
'a' – Append to file.
'rb'/'wb' – Read/write in binary mode.

file_path = "C:/Python Tutorial/Day 5/data.txt"  # full path

file = open(file_path, "r")  # open file in read mode

content = file.read()  # read entire content

print(content)  # display content

file.close()  # close the file

Hello World!

Python is fun.

Step 3: Writing to a Text File

Append or overwrite data:

file_path = "C:/Python Tutorial/Day 5/data.txt"

file = open(file_path, "a")  # open file in append mode

file.write("This is a new line.\n")  # write new line

file.close()  # close the file

Step 4: Working with CSV Files

You can use csv or pandas. Here we use csv:

import csv

# Writing CSV

with open("C:/Python Tutorial/Day 5/data.csv", "w", newline="") as file:

  writer = csv.writer(file)  # create writer object

  writer.writerow(["Name", "Age"])  # header row

  writer.writerow(["Alisha", 30])  # data row

  writer.writerow(["John", 25])

# Reading CSV

with open("C:/Python Tutorial/Day 5/data.csv", "r") as file:

  reader = csv.reader(file)  # create reader object

  for row in reader:

    print(row)  # print each row

Explanation: We open CSV with with so it auto-closes. writerow writes a list as a row. reader reads each row as a list.

Tip: For complex CSVs, you can use pandas:

import pandas as pd

df = pd.read_csv("C:/Python Tutorial/Day 5/data.csv")

print(df.head())

['Name', 'Age']

['Alisha', '30']

['John', '25']

Step 5: PyCharm Project Example – File Handling

        MyPythonProject

        ├── file_demo.py

        ├── data.txt

        └── data.csv

# file_demo.py
with open("C:/Python Tutorial/Day 5/data.txt", "r") as f:
    print(f.read())  # read content

with open("C:/Python Tutorial/Day 5/data.txt", "a") as f:
    f.write("Added line from Python\n")  # append a new line

Step 6: Web Scraping & Data Extraction

1. Installing Required Packages

Open command prompt/terminal:

pip install requests beautifulsoup4 lxml pandas pytube SpeechRecognition
    

2. Reading a Webpage

Using requests and BeautifulSoup to get title and text:

import requests

from bs4 import BeautifulSoup

url = "https://www.example.com"

response = requests.get(url)  # fetch webpage

soup = BeautifulSoup(response.text, 'lxml')  # parse HTML

print(soup.title.text)  # print page title

print(soup.get_text()[:300])  # print first 300 chars of visible text

Tip: If you prefer faster parsing and stricter HTML handling, you can also use html.parser or lxml as parser.

3. Extracting All Links

for link in soup.find_all('a'):

  print(link.get('href'))  # print href attribute

4. Downloading & Converting YouTube Video to Text

Use pytube to download and SpeechRecognition to convert audio:

from pytube import YouTube

import speech_recognition as sr

yt = YouTube("https://www.youtube.com/watch?v=exampleID")

stream = yt.streams.filter(only_audio=True).first()  # get audio only

stream.download(filename="video_audio.mp4")

r = sr.Recognizer()

with sr.AudioFile("video_audio.mp4") as source:

  audio = r.record(source)  # read audio file

text = r.recognize_google(audio)  # convert audio to text

print(text[:500])  # first 500 chars

Extra Tips

For structured HTML tables, pandas.read_html() can extract tables directly.
You can combine requests + pandas for CSV data online.
Always respect website’s robots.txt and scraping policies.

✔ End of Day 5 – File handling & advanced web scraping mastered. You can now handle local files, CSVs, scrape webpages, extract links, and convert YouTube videos to text!

← Previous: Day 4: Python Functions, Scope & Modules ← Previous: Next: Day 6: Python Data Structures – Lists, Tuples, Sets, Dictionaries → Next: →