Web Scraping Using Beautiful Soup

  • Published
  • Posted in General
  • Updated
  • 3 mins read

If you are a part of data industry where you are providing powerful solutions to your client and randomly there is an urgent need to scrap the data from a website, provided you have no prior experience in related technology. Well, here’s a quick savior example for you. 

Requirement: Scrap the list of brands from a website. 

Graphical user interface

Description automatically generated with medium confidence

Graphical user interface, application

Description automatically generated

Observations:

Website name : leafly.com 
Brands displayed : 1937 Farms, 1937 etc. 
Total pages for brands = 470 

Quick Solution: 

Graphical user interface, text, application, email

Description automatically generated

You can implement the above code and get your desired list of brands in CSV file. 

Code Brief: 

  • Request your website (import request) 
  • Use feature library to scrap data (from bs4 import BeautifulSoup) 
  • Create an empty list where you want your brands to go. (BrandList = []) 
  • Declare a variable that give you number of pages you want to crawl for scraping (pagenum) 
  • For crawling we use a for loop where we give range of number of pages 
  • This crawling run for website which we store in variable url by appending next page number in a continuous loop (url) 
  • We then request for content in that website (requests.get()) 
  • We fetch our actual content with Beautiful Soup library which takes req as an argument, here html.parser serves as the basis for parsing text files formatted in HTML 
  • Our final data is then stored in variable brand name where we use content.find_all() function. Here ‘h3’ is an html element which has a certain class value. You can find this on inspecting your website and point the exact element you want in your output. (Note: There can be multiple combination of calling css selectors) 

Every time, you extract a value it shall then append in your empty list. 
At last, you can save your output file. 

Note: Beautiful Soup is optimized mostly to work with static websites and comparatively getting smaller amount of data. Your results may vary. However, it is a great option to fulfill the beginner scraping requirements. Web Scraping is very wide. You can try with various great libraries available in python like selenium, scrapy etc. and dive deeper as per your interest. 

Happy Learning! 

Raksha Gangwal 
Data Analyst 
Addend Analytics 

Addend Analytics is a Microsoft Gold Partner based in Mumbai, India, and a branch office in the U.S.

Addend has successfully implemented 100+ Microsoft Power BI and Business Central projects for 100+ clients across sectors like Financial Services, Banking, Insurance, Retail, Sales, Manufacturing, Real estate, Logistics, and Healthcare in countries like the US, Europe, Switzerland, and Australia.

Get a free consultation now by emailing us or contacting us.