On Feb 8, 6:54 pm, Leif K-Brooks <[EMAIL PROTECTED]> wrote: > k0mp wrote: > > Is there a way to retrieve a web page and before it is entirely > > downloaded, begin to test if a specific string is present and if yes > > stop the download ? > > I believe that urllib.openurl(url) will retrieve the whole page before > > the program goes to the next statement. > > Use urllib.urlopen(), but call .read() with a smallish argument, e.g.: > > >>> foo = urllib.urlopen('http://google.com') > >>> foo.read(512) > '<html><head> ... > > foo.read(512) will return as soon as 512 bytes have been received. You > can keep caling it until it returns an empty string, indicating that > there's no more data to be read.
Thanks for your answer :) I'm not sure that read() works as you say. Here is a test I've done : import urllib2 import re import time CHUNKSIZE = 1024 print 'f.read(CHUNK)' print time.clock() for i in range(30) : f = urllib2.urlopen('http://google.com') while True: # read the page using a loop chunk = f.read(CHUNKSIZE) if not chunk: break m = re.search('<html>', chunk ) if m != None : break print time.clock() print print 'f.read()' print time.clock() for i in range(30) : f = urllib2.urlopen('http://google.com') m = re.search('<html>', f.read() ) if m != None : break print time.clock() It prints that : f.read(CHUNK) 0.1 0.31 f.read() 0.31 0.32 It seems to take more time when I use read(size) than just read. I think in both case urllib.openurl retrieve the whole page. -- http://mail.python.org/mailman/listinfo/python-list