Bugs item #1772481, was opened at 2007-08-12 01:22 Message generated for change (Comment added) made by acreature You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1772481&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Creature (acreature) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2 hangs with some documents. Initial Comment: While working on a web spider I encountered the following page that causes the read() call of a urllib2 response to fail. It uses 100% of the CPU and does not seem to ever return. I have this behaviour on Python 2.4.4, but several people on 2.5.1 have tried the code below and reported the same behaviour. By the way, the page it uses is a porn site, but please don't get hung up on that fact. This is a data processing issue, not a subject matter issue. This test case is attached as a file, but is also available at http://pastebin.com/d6f98618f . Please note that the user-agent masquerading is present to rule out any issues with the server returning different data to different clients; commenting out the line so Python sends the standard headers still results in the issue occuring. ---------------------------------------------------------------------- >Comment By: Creature (acreature) Date: 2007-08-12 01:32 Message: Logged In: YES user_id=1407924 Originator: YES It seems that a fix to this issue is to change line 525 to add "or line == ''" on httplib.py in Python 2.4.4: # read and discard trailer up to the CRLF terminator ### note: we shouldn't have any trailers! while True: line = self.fp.readline() if line == '\r\n' or line == '': break I'm told that this is found on line 574 on Python 2.5. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1772481&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com