[issue37093] http.client aborts header parsing upon encountering non-ASCII header names

Tim Burke Wed, 29 May 2019 16:33:13 -0700

New submission from Tim Burke <tim.bu...@gmail.com>:

First, spin up a fairly trivial http server:


    import wsgiref.simple_server
    
    def app(environ, start_response):
        start_response('200 OK', [
            ('Some-Canonical', 'headers'),
            ('sOme-CRAzY', 'hEaDERs'),
            ('Utf-8-Values', '\xe2\x9c\x94'),
            ('s\xc3\xb6me-UT\xc6\x92-8', 'in the header name'),
            ('some-other', 'random headers'),
        ])
        return [b'Hello, world!\n']
    
    if __name__ == '__main__':
        httpd = wsgiref.simple_server.make_server('', 8000, app)
        while True:
            httpd.handle_request()

Note that this code works equally well on py2 or py3; the interesting bytes on 
the wire are the same on either.

Verify the expected response using an independent tool such as curl:

    $ curl -v http://localhost:8000
    *   Trying ::1...
    * TCP_NODELAY set
    * connect to ::1 port 8000 failed: Connection refused
    *   Trying 127.0.0.1...
    * TCP_NODELAY set
    * Connected to localhost (127.0.0.1) port 8000 (#0)
    > GET / HTTP/1.1
    > Host: localhost:8000
    > User-Agent: curl/7.64.0
    > Accept: */*
    > 
    * HTTP 1.0, assume close after body
    < HTTP/1.0 200 OK
    < Date: Wed, 29 May 2019 23:02:37 GMT
    < Server: WSGIServer/0.2 CPython/3.7.3
    < Some-Canonical: headers
    < sOme-CRAzY: hEaDERs
    < Utf-8-Values: ✔
    < söme-UTƒ-8: in the header name
    < some-other: random headers
    < Content-Length: 14
    < 
    Hello, world!
    * Closing connection 0

Check that py2 includes all the same headers:

    $ python2 -c 'import pprint, urllib; resp = 
urllib.urlopen("http://localhost:8000";); 
pprint.pprint((dict(resp.info().items()), resp.read()))'
    ({'content-length': '14',
      'date': 'Wed, 29 May 2019 23:03:02 GMT',
      'server': 'WSGIServer/0.2 CPython/3.7.3',
      'some-canonical': 'headers',
      'some-crazy': 'hEaDERs',
      'some-other': 'random headers',
      's\xc3\xb6me-ut\xc6\x92-8': 'in the header name',
      'utf-8-values': '\xe2\x9c\x94'},
     'Hello, world!\n')

But py3 *does not*:

    $ python3 -c 'import pprint, urllib.request; resp = 
urllib.request.urlopen("http://localhost:8000";); 
pprint.pprint((dict(resp.info().items()), resp.read()))'
    ({'Date': 'Wed, 29 May 2019 23:04:09 GMT',
      'Server': 'WSGIServer/0.2 CPython/3.7.3',
      'Some-Canonical': 'headers',
      'Utf-8-Values': 'â\x9c\x94',
      'sOme-CRAzY': 'hEaDERs'},
     b'Hello, world!\n')

Instead, it is missing the first header that has a non-ASCII name as well as 
all subsequent headers (even if they are all-ASCII). Interestingly, the 
response body is intact.

This is eventually traced back to email.feedparser's expectation that all 
headers conform to rfc822 and its assumption that anything that *doesn't* 
conform must be part of the body: 
https://github.com/python/cpython/blob/v3.7.3/Lib/email/feedparser.py#L228-L236

However, http.client has *already* determined the boundary between headers and 
body in parse_headers, and sent everything that it thinks is headers to the 
parser: 
https://github.com/python/cpython/blob/v3.7.3/Lib/http/client.py#L193-L214

----------
messages: 343942
nosy: tburke
priority: normal
severity: normal
status: open
title: http.client aborts header parsing upon encountering non-ASCII header 
names
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37093>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37093] http.client aborts header parsing upon encountering non-ASCII header names

Reply via email to