Rex wrote:

I am writing a script that executes a bunch of queries through a form
on a website and reads the results. I am only interested in the
<title> section in the <head> of each web page. Currently, each page
the server returns is about 100kb and contains a bunch of HTML and
Javascript, all of which I don't need; I don't want to waste bandwidth
or consume too much of the server's resources. I just need the <title>
string.

you need to issue a GET request to get the HTML head section, which almost always means that the server will build the entire page before sending it to you (so it can set content-length etc).

you can save on network traffic by parsing the data as it arrives, and stopping when you've gotten the TITLE element:

    http://effbot.org/librarybook/sgmllib.htm

</F>

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to