If you are a part of data industry where you are providing powerful solutions to your client and randomly there is an urgent need to scrap the data from a website, provided you have no prior experience in related technology. Well, here’s a quick savior example for you.
Requirement: Scrap the list of brands from a website.
Website name : leafly.com
Brands displayed : 1937 Farms, 1937 etc.
Total pages for brands = 470
You can implement the above code and get your desired list of brands in CSV file.
- Request your website (import request)
- Use feature library to scrap data (from bs4 import BeautifulSoup)
- Create an empty list where you want your brands to go. (BrandList = )
- Declare a variable that give you number of pages you want to crawl for scraping (pagenum)
- For crawling we use a for loop where we give range of number of pages
- This crawling run for website which we store in variable url by appending next page number in a continuous loop (url)
- We then request for content in that website (requests.get())
- We fetch our actual content with Beautiful Soup library which takes req as an argument, here html.parser serves as the basis for parsing text files formatted in HTML
- Our final data is then stored in variable brand name where we use content.find_all() function. Here ‘h3’ is an html element which has a certain class value. You can find this on inspecting your website and point the exact element you want in your output. (Note: There can be multiple combination of calling css selectors)
Every time, you extract a value it shall then append in your empty list.
At last, you can save your output file.
Note: Beautiful Soup is optimized mostly to work with static websites and comparatively getting smaller amount of data. Your results may vary. However, it is a great option to fulfill the beginner scraping requirements. Web Scraping is very wide. You can try with various great libraries available in python like selenium, scrapy etc. and dive deeper as per your interest.