Antoine Pitrou <pit...@free.fr> added the comment:

Note that according to RFC 3977, “The character set for all NNTP commands is 
UTF-8”.

But it also says this about multi-line data blocks:

   Note that texts using an encoding (such as UTF-16 or UTF-32) that may
   contain the octets NUL, LF, or CR other than a CRLF pair cannot be
   reliably conveyed in the above format (that is, they violate the MUST
   requirement above).  However, except when stated otherwise, this
   specification does not require the content to be UTF-8, and therefore
   (subject to that same requirement) it MAY include octets above and
   below 128 mixed arbitrarily.

IMO, it should decode/encode by default using utf-8 (with the "surrogateescape" 
error handler for easy round-tripping with non-compliant servers), except for 
raw articles (bodies / envelopes) where bytes should be returned.

----------
nosy: +pitrou

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9360>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to