Re: Error getting data from website

DL Neil via Python-list Fri, 06 Dec 2019 16:34:50 -0800

On 7/12/19 12:53 PM, Sam Paython wrote:

This is the code I am writing:
import requests
from bs4 import BeautifulSoup
request = requests.get("https://www.amazon.ca/dp/B07RZFQ6HC";)
content = request.content
soup = BeautifulSoup(content, "html.parser")
element = soup.find("span",{"id":"priceblock_dealprice"})
print(element.text.strip())


and this is the error I am getting:
C:\Users\Sam\PycharmProjects\untitled2\venv\Scripts\python.exe 
C:/Users/Sam/PycharmProjects/untitled2/src/app.py
Traceback (most recent call last):
   File "C:/Users/Sam/PycharmProjects/untitled2/src/app.py", line 9, in <module>
     print(element.text.strip())
AttributeError: 'NoneType' object has no attribute 'text'

Could someone please help?

The err.msg/stack-trace is your friend! The comment about "NoneType"means 'there's nothing there' (roughly!) to print().


The question then becomes: "why?" or "why not?"...

With a short piece of code like this, and (I am assuming) trying-out alibrary for the first time, may I recommend that you use the PythonREPL, because it allows you to 'see' what's going-on behind thescenes/underneath the hood - and ultimately, reveals the problem.


From a Python terminal (cmd is appropriate to your PC's OpSys):

[dn@JrBrown ~]$ python3
Python 3.7.4 (default, Jul  9 2019, 16:48:28)
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> from bs4 import BeautifulSoup
>>> request = requests.get("https://www.amazon.ca/dp/B07RZFQ6HC";)
>>> request            # notice how I'm asking to 'see' what happened
<Response [503]>
>>> content = request.content
>>> content            # there is no need to enclose in print()!

b'<!DOCTYPE html>\n<!--[if lt IE 7]> <html lang="en-us" class="a-no-js...many lines of HTML, excised in the interests of brevity...

\')[0].appendChild(elem);\n    }\n    </script>\n</body></html>\n'
>>> soup = BeautifulSoup(content, "html.parser")
>>> soup
<!DOCTYPE html>
...many more lines of HTML...
</body></html>

>>> element = soup.find("span",{"id":"priceblock_dealprice"})
>>> element
>>>

The last entry is asking for the contents of "element" to be displayed -and they are, excepting that element contains nothing/None. Oops!

Working 'backwards' (and using 'simple' Python functions to prove thatit is not our use of requests/BS4 that is at-fault):


>>> soup.find( "price" )             # not found

>>> content.find( b"price" )         # the b"" is necessary because
                                        # we are dealing with bytes
                                        # not a Unicode string
-1
>>>                                    #

Sadly, the -1 indicates that "price" was not found. Which is bound to bedisappointing to you.



Yet all is not lost!

If you read the HTML data that the REPL has happily splattered all overyour terminal's screen (scroll back) (NB "soup" is easier to read thanis "content"!) you will observe that what you saw in your web-browser isnot what Amazon served in response to the Python "requests.get()"!

--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Error getting data from website

Reply via email to