On Feb 8, 6:20 pm, "k0mp" <[EMAIL PROTECTED]> wrote: > On Feb 8, 6:54 pm, Leif K-Brooks <[EMAIL PROTECTED]> wrote: > > > > > k0mp wrote: > > > Is there a way to retrieve a web page and before it is entirely > > > downloaded, begin to test if a specific string is present and if yes > > > stop the download ? > > > I believe that urllib.openurl(url) will retrieve the whole page before > > > the program goes to the next statement. > > > Use urllib.urlopen(), but call .read() with a smallish argument, e.g.: > > > >>> foo = urllib.urlopen('http://google.com') > > >>> foo.read(512) > > '<html><head> ... > > > foo.read(512) will return as soon as 512 bytes have been received. You > > can keep caling it until it returns an empty string, indicating that > > there's no more data to be read. > > Thanks for your answer :) > > I'm not sure that read() works as you say. > Here is a test I've done : > > import urllib2 > import re > import time > > CHUNKSIZE = 1024 > > print 'f.read(CHUNK)' > print time.clock() > > for i in range(30) : > f = urllib2.urlopen('http://google.com') > while True: # read the page using a loop > chunk = f.read(CHUNKSIZE) > if not chunk: break > m = re.search('<html>', chunk ) > if m != None : > break > [snip] I'd just like to point out that the above code assumes that the '<html>' is entirely within one chunk; it could in fact be split across chunks.
-- http://mail.python.org/mailman/listinfo/python-list