Re: Is it possible to download only the of a web page?

Gabriel Genellina Thu, 04 Sep 2008 21:23:43 -0700

En Thu, 04 Sep 2008 18:53:33 -0300, Fredrik Lundh <[EMAIL PROTECTED]>escribi�:

Rex wrote:
I am writing a script that executes a bunch of queries through a form
on a website and reads the results. I am only interested in the
<title> section in the <head> of each web page. Currently, each page
the server returns is about 100kb and contains a bunch of HTML and
Javascript, all of which I don't need; I don't want to waste bandwidth
or consume too much of the server's resources. I just need the <title>
string.
you need to issue a GET request to get the HTML head section, whichalmost always means that the server will build the entire page beforesending it to you (so it can set content-length etc).
you can save on network traffic by parsing the data as it arrives, andstopping when you've gotten the TITLE element:
     http://effbot.org/librarybook/sgmllib.htm

Another alternative would be to estimate the size it takes to reach to the<title> tag, and issue a GET with a Range header. The server will -verylikely- have to build the entire page, but won't attempt to send morebytes than requested. (In case the requested size is not enough, one canissue another GET asking for more data)


http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: Is it possible to download only the of a web page?

Reply via email to