Michael Torrie wrote: > On 12/6/19 5:31 PM, DL Neil via Python-list wrote: >> If you read the HTML data that the REPL has happily splattered all over >> your terminal's screen (scroll back) (NB "soup" is easier to read than >> is "content"!) you will observe that what you saw in your web-browser is >> not what Amazon served in response to the Python "requests.get()"! > > Sadly it's likely that Amazon's page is largely built from javascript.
That's not the problem here. Quoting the html returned by requests.get("https://www.amazon.ca/dp/B07RZFQ6HC") """ To discuss automated access to Amazon data please contact api-services- supp...@amazon.com. """ If you retrieve the page manually: $ wget "https://www.amazon.ca/dp/B07RZFQ6HC" -O tmp.gz [...] 2019-12-07 11:47:03 (80,6 KB/s) - »tmp.gz« gespeichert [115426] $ gunzip tmp.gz $ python3 [...] Type "help", "copyright", "credits" or "license" for more information. >>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup(open("tmp").read()) >>> soup.find("span", dict(id="priceblock_dealprice") ... ) <span class="a-size-medium a-color-price priceBlockDealPriceString" id="priceblock_dealprice">CDN$ 1,019.00</span> >>> _.text 'CDN$\xa01,019.00' > So scraping static html is probably not going to get you where you want > to go. ... because Amazon doesn' like what you do. You can cheat or play by their rules and use the API. > There are heavier tools, such as Selenium that uses a real > browser to grab a page, and the result of that you can parse and search > perhaps. -- https://mail.python.org/mailman/listinfo/python-list