New submission from Tim Burke <tim.bu...@gmail.com>: First, spin up a fairly trivial http server:
import wsgiref.simple_server def app(environ, start_response): start_response('200 OK', [ ('Some-Canonical', 'headers'), ('sOme-CRAzY', 'hEaDERs'), ('Utf-8-Values', '\xe2\x9c\x94'), ('s\xc3\xb6me-UT\xc6\x92-8', 'in the header name'), ('some-other', 'random headers'), ]) return [b'Hello, world!\n'] if __name__ == '__main__': httpd = wsgiref.simple_server.make_server('', 8000, app) while True: httpd.handle_request() Note that this code works equally well on py2 or py3; the interesting bytes on the wire are the same on either. Verify the expected response using an independent tool such as curl: $ curl -v http://localhost:8000 * Trying ::1... * TCP_NODELAY set * connect to ::1 port 8000 failed: Connection refused * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 8000 (#0) > GET / HTTP/1.1 > Host: localhost:8000 > User-Agent: curl/7.64.0 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 200 OK < Date: Wed, 29 May 2019 23:02:37 GMT < Server: WSGIServer/0.2 CPython/3.7.3 < Some-Canonical: headers < sOme-CRAzY: hEaDERs < Utf-8-Values: ✔ < söme-UTƒ-8: in the header name < some-other: random headers < Content-Length: 14 < Hello, world! * Closing connection 0 Check that py2 includes all the same headers: $ python2 -c 'import pprint, urllib; resp = urllib.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))' ({'content-length': '14', 'date': 'Wed, 29 May 2019 23:03:02 GMT', 'server': 'WSGIServer/0.2 CPython/3.7.3', 'some-canonical': 'headers', 'some-crazy': 'hEaDERs', 'some-other': 'random headers', 's\xc3\xb6me-ut\xc6\x92-8': 'in the header name', 'utf-8-values': '\xe2\x9c\x94'}, 'Hello, world!\n') But py3 *does not*: $ python3 -c 'import pprint, urllib.request; resp = urllib.request.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))' ({'Date': 'Wed, 29 May 2019 23:04:09 GMT', 'Server': 'WSGIServer/0.2 CPython/3.7.3', 'Some-Canonical': 'headers', 'Utf-8-Values': 'â\x9c\x94', 'sOme-CRAzY': 'hEaDERs'}, b'Hello, world!\n') Instead, it is missing the first header that has a non-ASCII name as well as all subsequent headers (even if they are all-ASCII). Interestingly, the response body is intact. This is eventually traced back to email.feedparser's expectation that all headers conform to rfc822 and its assumption that anything that *doesn't* conform must be part of the body: https://github.com/python/cpython/blob/v3.7.3/Lib/email/feedparser.py#L228-L236 However, http.client has *already* determined the boundary between headers and body in parse_headers, and sent everything that it thinks is headers to the parser: https://github.com/python/cpython/blob/v3.7.3/Lib/http/client.py#L193-L214 ---------- messages: 343942 nosy: tburke priority: normal severity: normal status: open title: http.client aborts header parsing upon encountering non-ASCII header names versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37093> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com