On Fri, Aug 28, 2020 at 11:24 PM Chris Green <c...@isbd.net> wrote: > > Chris Angelico <ros...@gmail.com> wrote: > > > > Also, if you're parsing an email message, you can and should be doing > > so with respect to the encoding(s) stipulated in the headers, after > > which you will have valid Unicode text. > > > But not all E-Mail messages are 'well behaved', the above works fine > if the headers specify the correct text encoding but quite often one > will get messages with no encoding specified and also one gets > messages with the wrong encoding specified. One needs a way to handle > these 'rogue' messages such that most of the characters come out right. >
As D'Arcy posted, this is what the error handling is for. If you want to decode in a "sloppy" way such that mis-encoded text can be partially decoded, then that's what you want to do - decode with a sloppy error handler. Don't abuse arbitrary eight bit encodings in the hope that it'll do a better job just because it doesn't spit out any exceptions. ChrisA -- https://mail.python.org/mailman/listinfo/python-list