Hey, I am currently working on a simple program which scrapes text from webpages via a URL, then segments it (with Spacy).
I’m trying to refine my program to use just the right tools for the job, for each of the steps. Requests.get works great, but I’ve seen people use urllib.request.urlopen() in some examples. It appealed to me because it seemed lower level than requests.get, so it just makes the program feel leaner and purer and more direct. However, requests.get works fine on this url: https://juno.sh/direct-connection-to-jupyter-server/ But urllib returns a “403 forbidden”. Could anyone please comment on what the fundamental differences are between urllib vs. requests, why this would happen, and if urllib has any option to prevent this and get the page source? Thanks, Julius -- https://mail.python.org/mailman/listinfo/python-list