On 09/03/2018 07:29 AM, Amrit Pandey wrote: > I am learning web scraping with python. > > With urllib.request.urlopen() I am able to fetch http urls, but https give > some certificate error. How can we bypass the certificate check or is there > any other configuration that is used for https urls?
if you really want to ignore (and usually problems mean a problem on your end, not on the other end, so you should probably not do that - the exception being if you're scraping your own test server, which might not have official certs), you create an ssl context and set the appropriate settings, passing that to urllib. /Conceptually/ something like this, but there may be quite a few more details needed: ctx = ssl.create_default_context() ctx.verify_mode = ssl.CERT_NONE with urllib.request.urlopen(url, context=ctx) as f: data = f.read() But in general your life will be much more pleasant if you use the requests module instead of urllib. there are also extensive and well debugged web scraping packages in Python, if you're actually looking to deploy something, as opposed to learning something (and there's definitely nothing wrong with a learning exercise!!!) you should look at scrapy and others. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor