Mark Sapiro <m...@msapiro.net> added the comment:

I've researched this further, and I know how this happens. The original message 
contains a text/html part (in my case, the only part) which contains a base64 
or quoted-printable body which when decoded contains non-ascii. It is parsed 
correctly by email.message_from_bytes.

It is then processed by Mailman's content filtering which retrieves html 
payload via

    part.get_payload(decode=True).decode(ctype, errors='replace'))

where part is the text/html part and ctype is 'utf-8' in this case. It then 
uses elinks, lynx or some other configured command to convert the html payload 
to plain text and that plain text still contains non-ascii.

It then replaces the payload and sets the content type via

    del part['content-transfer-encoding']
    part.set_payload(plain_text)
    part.set_type('text/plain')

And this results in a message which can't be flattened as_bytes.

The issue is set_payload() should encode the payload appropriately and in fact, 
it does if an appropriate charset is given, so this is our error in not 
providing a charset= argument to set_payload.

Closing this and the corresponding PR.

----------
stage: patch review -> resolved
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39384>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to