Date: 03 Oct 2022
Module : Scrapy Installation : pip install Scrapy About: Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. Sample: import scrapy class ToScrapeCSSSpider(scrapy.Spider): name = "toscrape-css" start_urls = [ 'http://quotes.toscrape.com/', ] def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text::text").extract_first(), 'author': quote.css("small.author::text").extract_first(), 'tags': quote.css("div.tags > a.tag::text").extract() } next_page_url = response.css("li.next > a::attr(href)").extract_first() if next_page_url is not None: yield scrapy.Request(response.urljoin(next_page_url)) Execution: scrapy runspider scrape_sample.py -o quotes.json Reference: https://pypi.org/project/Scrapy/
_______________________________________________ Chennaipy mailing list Chennaipy@python.org https://mail.python.org/mailman/listinfo/chennaipy