Start Web Scraping with Beautiful Soup

Renee LIN
3 min readJun 29, 2022

As I use YouTube, and Reddit every day, I wonder how I can gather information about some topic automatically. For example, if I search “FFXIV” on YouTube, there will be a long list of videos, is there a way to collect the title, views, and channel info to excel files automatically? The solution is web scrapping, collecting data from a webpage automatically.

  1. What is Beautiful Soup
  2. How to use it
  3. Try to get information from a YouTube video

1. What is Beautiful Soup

Beautiful Soup is one of the most common python packages for web scrapping. It allows you to pull data from HTML and XML files. It helps you to remove the HTML markup, parse the documents and save the information.

2. How to use it

pip install BeautifulSoup4

Assuming we have requested a webpage, and the response is like this:

html_doc = """<html><head><title>The Dormouse's story</title></head>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="" class="sister" id="link1">Elsie</a>,
<a href="" class="sister" id="link2">Lacie</a> and
<a href="" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>

With Beautiful Soup, you obtain the object first.

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')


You can use different methods to get information wrapped by specific tags.

# <title>The Dormouse's story</title>
# <p class="title"><b>The Dormouse's story</b></p>
# [<a class="sister" href="" id="link1">Elsie</a>,
# <a class="sister" href="" id="link2">Lacie</a>,
# <a class="sister" href="" id="link3">Tillie</a>]

# <a class="sister" href="" id="link3">Tillie</a>
Renee LIN

Passionate about web dev and data analysis. Huge FFXIV fan. Aiming to work with healthcare data for a living in 2024.