Cameron Simpson <c...@cskk.id.au> wrote: > On 28Aug2020 08:56, Chris Green <c...@isbd.net> wrote: > >Stefan Ram <r...@zedat.fu-berlin.de> wrote: > >> Chris Angelico <ros...@gmail.com> writes: > >> >But this is a really good job for a list comprehension: > >> >sss = [str(word) for word in bbb] > >> > >> Are you all sure that "str" is really what you all want? > >> > >Not absolutely, you no doubt have been following other threads related > >to this one. :-) > > It is almost certainly not what you want. You want some flavour of > bytes.decode. If the BytesParser doesn't cope, you may need to parse the > headers as some kind of text (eg ISO8859-1) until you find a > content-transfer-encoding header (which still applies only to the body, > not the headers). > > >> |>>> b = b"b" > >> |>>> str( b ) > >> |"b'b'" > >> > >> Maybe try to /decode/ the bytes? > >> > >> |>>> b.decode( "ASCII" ) > >> |'b' > >> > >> > >Therein lies the problem, the incoming byte stream *isn't* ASCII, it's > >an E-Mail message which may, for example, have UTF-8 or other encoded > >characters in it. Hopefully it will have an encoding given in the > >header but that's only if the sender is 'well behaved', one needs to > >be able to handle almost anything and it must be done without 'manual' > >interaction. > > POP3 is presumably handing you bytes containing a message. If the Python > email.BytesParser doesn't handle it, stash the raw bytes _elsewhere_ in > a distinct file in some directory. > > with open('evil_msg_bytes', 'wb') as f: > for bs in bbb: > f.write(bs) > > No interpreation requires, since parsing failed. Then you can start > dealing with these exceptions. _Do not_ write unparsable messages into > an mbox! > Maybe I shouldn't but Python 2 has been managing to do so for several years without any issues. I know I *could* put the exceptions in a bucket somewhere and deal with them separately but I'd really rather not.
At prsent (with the Python 2 code still installed) it all 'just works' and the absolute worst corruption I ever see in an E-Mail is things like accented characters missing altogether or £ signs coming out as a funny looking string. Either of these don't really make the message unintelligible. Are we saying that Python 3 really can't be made to handle things 'tolerantly' like Python 2 used to? -- Chris Green · -- https://mail.python.org/mailman/listinfo/python-list