Bugs item #1411097, was opened at 2006-01-20 20:26 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1411097&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: httplib patch to make _read_chunked() more robust Initial Comment: To reproduce: import urllib2 print urllib2.urlopen("http://66.117.37.13/").read() The attached patch "fixes" the hang, but that patch is not acceptable because it also removes the .readline() and .readlines() methods on the response object returned by urllib2.urlopen(). The patch seems to demonstrate that the problem is caused by the (ab)use of socket._fileobject in urllib2.AbstractHTTPHandler (I believe this hack was introduced when urllib2 switched to using httplib.HTTPConnection). Not sure yet what the actual problem is... ---------------------------------------------------------------------- >Comment By: John J Lee (jjlee) Date: 2006-02-06 01:24 Message: Logged In: YES user_id=261020 Oops, fixed chunk.patch to .strip() before comparing to "". ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2006-02-06 00:38 Message: Logged In: YES user_id=261020 First, expanding a bit on what I wrote on 2006-01-21: The problem does relate to chunked encoding, and is unrelated to urllib2's use of _fileobject. My hack to remove use of socket._fileobject from urllib2 merely breaks handling of chunked encoding by cutting httplib.HTTPResponse out of the picture. The problem is seen in urllib2 in recent Pythons thanks to urllib2 switching to use of httplib.HTTPConnection and HTTP/1.1, hence chunked encoding is allowed. urllib still uses httplib.HTTP, hence HTTP/1.0, so is unaffected. To reproduce with httplib: import httplib conn = httplib.HTTPConnection("66.117.37.13") conn.request("GET", "/", headers={"Connection": "close"}) r1 = conn.getresponse() print r1.read() The Connection: close is required -- if it's not there the server doesn't use chunked transfer-encoding. I verified with a packet sniffer that the problem is that this server does not send the final trailing CRLF required by section 3.6.1 of RFC 2616. However, that section also says that trailers (trailing HTTP headers) MUST NOT be sent by the server unless either a TE header was present and indicated that trailers are acceptable (httplib does not send the TE header), or the trailers are optional metadata and may be discarded by the client. So, I propose the attached patch to httplib (chunk.patch) as a work-around. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2006-01-21 22:10 Message: Logged In: YES user_id=261020 In fact the commit message for rev 36871 states the real reason _fileobject is used (handling chunked encoding), showing my workaround is even more harmful than I thought. Moreover, doing a urlopen on 66.117.37.13 shows the response *is* chunked. The problem seems to be caused by httplib failing to find a CRLF at the end of the chunked response, so the loop at the end of _read_chunked() never terminates. Haven't looked in detail yet, but I'm guessing a) it's the server's fault and b) httplib should work around it. Here's the commit message from 36871: Fix urllib2.urlopen() handling of chunked content encoding. The change to use the newer httplib interface admitted the possibility that we'd get an HTTP/1.1 chunked response, but the code didn't handle it correctly. The raw socket object can't be pass to addinfourl(), because it would read the undecoded response. Instead, addinfourl() must call HTTPResponse.read(), which will handle the decoding. One extra wrinkle is that the HTTPReponse object can't be passed to addinfourl() either, because it doesn't implement readline() or readlines(). As a quick hack, use socket._fileobject(), which implements those methods on top of a read buffer. (suggested by mwh) Finally, add some tests based on test_urllibnet. Thanks to Andrew Sawyers for originally reporting the chunked problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1411097&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com