Re: [Tutor] How to read `https` url with urllib.request.urlopen?

Mats Wichmann Mon, 03 Sep 2018 13:14:30 -0700

On 09/03/2018 07:29 AM, Amrit Pandey wrote:
> I am learning web scraping with python.
> 
> With urllib.request.urlopen() I am able to fetch http urls, but https give 
> some certificate error. How can we bypass the certificate check or is there 
> any other configuration that is used for https urls?


if you really want to ignore (and usually problems mean a problem on
your end, not on the other end, so you should probably not do that - the
exception being if you're scraping your own test server, which might not
have official certs), you create an ssl context and set the appropriate
settings, passing that to urllib.  /Conceptually/ something like this,
but there may be quite a few more details needed:

ctx = ssl.create_default_context()
ctx.verify_mode = ssl.CERT_NONE
with urllib.request.urlopen(url, context=ctx) as f:
    data = f.read()


But in general your life will be much more pleasant if you use the
requests module instead of urllib.


there are also extensive and well debugged web scraping packages in
Python, if you're actually looking to deploy something, as opposed to
learning something (and there's definitely nothing wrong with a learning
exercise!!!) you should look at scrapy and others.

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] How to read `https` url with urllib.request.urlopen?

Reply via email to