Bugs item #1486335, was opened at 2006-05-11 04:14 Message generated for change (Comment added) made by altman You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1486335&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: kxroberto (kxroberto) Assigned to: Greg Ward (gward) Summary: httplib: read/_read_chunked failes with ValueError sometime Initial Comment: This occasionally shows up in a logged trace, when a application crahes on ValueError on a http(s)_response.read() : (py2.3.5 - yet relevant httplib code is still the same in current httplib) .... \' File "socket.pyo", line 283, in read\\n\', \' File "httplib.pyo", line 389, in read\\n\', \' File "httplib.pyo", line 426, in _read_chunked\\n\', \'ValueError: invalid literal for int(): \\n\'] ::: its the line: chunk_left = int(line, 16) Don't know what this line is about. Yet, that should be protected, as a http_response.read() should not fail with ValueError, but only with IOError/EnvironmentError, socket.error - otherwise Error Exception handling becomes a random task. -Robert Side note regarding IO exception handling: See also FR #1481036 (IOBaseError): why socket.error.__bases__ is (<class exceptions.Exception at 0x011244E0>,) ? ---------------------------------------------------------------------- Comment By: Patrick Altman (altman) Date: 2007-03-14 10:39 Message: Logged In: YES user_id=405010 Originator: NO I am attempting to use a HEAD request against Amazon S3 to check whether a file exists or not and if it does parse the md5 hash from the ETag in the response to verify the contents of the file so as to save on bandwidth of uploading files when it is not necessary. If the file exist, the HEAD works as expected and I get valid headers back that I can parse and pull the ETag out of the dictionary using getheader('ETag')[1:-1] (using the slice to trim off the double-quotes in the string. The problem lies when I attempt to send a HEAD request when no file exists. As expected, a 404 Not Found response is sent back from Amazon however, my test scripts seem to hang. I run python with trace.py and it hangs here: --- modulename: httplib, funcname: _read_chunked httplib.py(536): assert self.chunked != _UNKNOWN httplib.py(537): chunk_left = self.chunk_left httplib.py(538): value = '' httplib.py(542): while True: httplib.py(543): if chunk_left is None: httplib.py(544): line = self.fp.readline() --- modulename: socket, funcname: readline socket.py(321): data = self._rbuf socket.py(322): if size < 0: socket.py(324): if self._rbufsize <= 1: socket.py(326): assert data == "" socket.py(327): buffers = [] socket.py(328): recv = self._sock.recv socket.py(329): while data != "\n": socket.py(330): data = recv(1) It eventually completes with an exception here: File "C:\Python25\lib\httplib.py", line 509, in read return self._read_chunked(amt) File "C:\Python25\lib\httplib.py", line 548, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: '' For reference, ethereal captured the following request and response: HEAD <REMOVED> HTTP/1.1 Host: s3.amazonaws.com Accept-Encoding: identity Date: Tue, 13 Mar 2007 02:54:12 GMT Authorization: AWS <REMOVED> HTTP/1.1 404 Not Found x-amz-request-id: E20B4C0D0C48B2EF x-amz-id-2: <REMOVED> Content-Type: application/xml Transfer-Encoding: chunked Date: Tue, 13 Mar 2007 02:54:16 GMT Server: AmazonS3 ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2006-08-07 19:23 Message: Logged In: YES user_id=261020 I think it's only worth worrying about bad chunking that a) has been observed in the wild (though not necessarily by us) and b) popular browsers can cope with. Greg: """If there is an error here, it's at EOF, so it's not that big a deal.""" That's only if the response will be closed at the end of the current transaction. Quoting from 1411097: """if the connection will not close at the end of the transaction, the behaviour should not change from what's currently in SVN (we should not assume that the chunked response has ended unless we see the proper terminating CRLF).""" Perhaps we don't need to be quite as strict as that, but the point is that otherwise, how do we know the server hasn't already sent that last CRLF, and that it will turn up in three weeks' time?-) If that happens, not sure exactly how httplib will treat the CRLF and possible chunked encoding trailers, but I suspect something bad happens. Perhaps we could just always close the connection in this case? I'm not confident I know yet how best to fix these issues. I just tried reading curl's transfer.c and http_chunks.c. I discovered only that I have to be fully awake to read a 1200 line function :-/ ---------------------------------------------------------------------- Comment By: Greg Ward (gward) Date: 2006-07-25 21:13 Message: Logged In: YES user_id=14422 OK, I've been working on this some more and I have a very crude addition to test_httplib.py. I'm going to attach it here and solicit feedback on python-dev: I'm not sure how many kinds of bad response chunking I really want to worry about. ---------------------------------------------------------------------- Comment By: Greg Ward (gward) Date: 2006-07-24 14:38 Message: Logged In: YES user_id=14422 I'm seeing this with Python 2.3.5 and 2.4.3 hitting a PHP app and getting a large error page. It looks as though the server is incorrectly chunking the response: lwp-request at least gives a better error message than httplib.py: $ GET "http://..." 500 EOF when chunk header expected I'm unclear on precisely what the server is doing wrong. The response looks like this: HTTP/1.1 200 OK Date: Mon, 24 Jul 2006 19:18:47 GMT Server: Apache/2.0.54 (Fedora) X-Powered-By: PHP/4.3.11 Connection: close Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8 2169\r\n \r\n [...first 0x2169 bytes of response...]\r\n 20b2\r\n [...next 0x20b2 bytes...] [...repeat many times...] 20b2\r\n [...the last 0x20b2 bytes...] \r\n The blank line at eof appears to be confusing httplib.py: it bombs because int('', 16) raises ValueError. Observation #1: if this is indeed a protocol error (ie. the server is in the wrong), httplib.py should turn the ValueError into an HTTPException. Perhaps it should define a new exception class for low-level protocol errors (bad chunking). Maybe it should reuse IncompleteRead. Observation #2: gee, my web browser doesn't barf on this response, so why should httplib.py? If there is an error here, it's at EOF, so it's not that big a deal. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1486335&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com