[issue10980] http.server Header Unicode Bug

STINNER Victor Sat, 22 Jan 2011 05:04:38 -0800

STINNER Victor <victor.stin...@haypocalc.com> added the comment:

Extract of PEP 3333: << Note also that strings passed to start_response() as a 
status or as response headers must follow RFC 2616 with respect to encoding. 
That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME 
encoding. >>


What is the best choice for portability (HTTP servers and web browsers): latin1 
or MIME encoding? Latin1 is a small subset of Unicode: only U+0000..U+00FF.

We should maybe give the choice to the user between Latin1, MIME, or maybe 
something else (eg. UTF-8, cp1252, ...). Or at least, you should try something 
like:

try:
   bytes = text.encode('latin1')
except UnicodeEncodeError:
   bytes = encodeMIME(text, 'utf-8')

Would it be a good idea to accept raw bytes headers? HTTP is *supposed* to be 
correctly encoded using different RFC, but in practical, anyone is free to do 
whateven he wants.

Sentence extracted randomly from the WWW (dec. 2008): "it seems that neither 
Tomcat 5.5 or 6 properly decodes HTTP headers as per RFC 2047! The Tomcat code 
assumes everywhere that header values use ISO-8859-1."

Finally, why do you consider that this issue have to be fixed before Python 3.2?

----------
nosy: +haypo

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10980>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10980] http.server Header Unicode Bug

Reply via email to