Re: urllib equivalent for HTTP requests

Diez B. Roggisch Wed, 08 Oct 2008 00:35:50 -0700

K schrieb:

Hello everyone,


I understand that urllib and urllib2 serve as really simple page
request libraries. I was wondering if there is a library out there
that can get the HTTP requests for a given page.

Example:
URL: http://www.google.com/test.html

Something like: urllib.urlopen('http://www.google.com/
test.html').files()

Lists HTTP Requests attached to that URL:
=> http://www.google.com/test.html
=> http://www.google.com/css/google.css
=> http://www.google.com/js/js.css

There are no "Requests attached" to an url. There is a HTML-documentbehind it, that might contain further external references.

The other fun part is the inclusion of JS within <script> tags, i.e.
the new Google Analytics script
=> http://www.google-analytics.com/ga.js

or css, @imports
=> http://www.google.com/css/import.css

I would like to keep track of that but I realize that py does not have
a JS engine. :( Anyone with ideas on how to track these items or am I
out of luck.


You can use e.g. BeautifulSoup to extract all links from the site.

What you can't do though is to get the requests that are issued byJavascript that is *running*.


Diez
--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib equivalent for HTTP requests

Reply via email to