Web Scraping Using Beautiful Soup

Published October 1, 2021
Posted in General
Updated July 15, 2023
3 mins read

If you are a part of data industry where you are providing powerful solutions to your client and randomly there is an urgent need to scrap the data from a website, provided you have no prior experience in related technology. Well, here’s a quick savior example for you.

Requirement: Scrap the list of brands from a website.

Observations:

Website name : leafly.com
Brands displayed : 1937 Farms, 1937 etc.
Total pages for brands = 470

Quick Solution:

You can implement the above code and get your desired list of brands in CSV file.

Code Brief:

Request your website (import request)
Use feature library to scrap data (from bs4 import BeautifulSoup)
Create an empty list where you want your brands to go. (BrandList = [])
Declare a variable that give you number of pages you want to crawl for scraping (pagenum)
For crawling we use a for loop where we give range of number of pages

This crawling run for website which we store in variable url by appending next page number in a continuous loop (url)
We then request for content in that website (requests.get())
We fetch our actual content with Beautiful Soup library which takes req as an argument, here html.parser serves as the basis for parsing text files formatted in HTML
Our final data is then stored in variable brand name where we use content.find_all() function. Here ‘h3’ is an html element which has a certain class value. You can find this on inspecting your website and point the exact element you want in your output. (Note: There can be multiple combination of calling css selectors)

Every time, you extract a value it shall then append in your empty list.
At last, you can save your output file.

Note: Beautiful Soup is optimized mostly to work with static websites and comparatively getting smaller amount of data. Your results may vary. However, it is a great option to fulfill the beginner scraping requirements. Web Scraping is very wide. You can try with various great libraries available in python like selenium, scrapy etc. and dive deeper as per your interest.

Happy Learning!

Raksha Gangwal
Data Analyst
Addend Analytics

Addend Analytics is a Microsoft Gold Partner based in Mumbai, India, and a branch office in the U.S.

Addend has successfully implemented 100+ Microsoft Power BI and Business Central projects for 100+ clients across sectors like Financial Services, Banking, Insurance, Retail, Sales, Manufacturing, Real estate, Logistics, and Healthcare in countries like the US, Europe, Switzerland, and Australia.

Get a free consultation now by emailing us or contacting us.

Email Us Or Contact Us

Web Scraping Using Beautiful Soup

Alcumus Certification

Alcumus Certification

Get in touch

India

United Kingdom

Get in touch

USA

Our Services

Follow us