On Fri, Aug 28, 2020 at 11:24 PM Chris Green <c...@isbd.net> wrote:
>
> Chris Angelico <ros...@gmail.com> wrote:
> >
> > Also, if you're parsing an email message, you can and should be doing
> > so with respect to the encoding(s) stipulated in the headers, after
> > which you will have valid Unicode text.
> >
> But not all E-Mail messages are 'well behaved', the above works fine
> if the headers specify the correct text encoding but quite often one
> will get messages with no encoding specified and also one gets
> messages with the wrong encoding specified.  One needs a way to handle
> these 'rogue' messages such that most of the characters come out right.
>

As D'Arcy posted, this is what the error handling is for. If you want
to decode in a "sloppy" way such that mis-encoded text can be
partially decoded, then that's what you want to do - decode with a
sloppy error handler. Don't abuse arbitrary eight bit encodings in the
hope that it'll do a better job just because it doesn't spit out any
exceptions.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to