On 27Aug2020 09:16, Chris Green <c...@isbd.net> wrote: >Cameron Simpson <c...@cskk.id.au> wrote: >> But note: joining bytes like strings is uncommon, and may indicate >> that >> you should be working in strings to start with. Eg you may want to >> convert popmsg from bytes to str and do a str.join anyway. It depends on >> exactly what you're dealing with: are you doing text work, or are you >> doing "binary data" work? >> >> I know many network protocols are "bytes-as-text, but that is >> accomplished by implying an encoding of the text, eg as ASCII, where >> characters all fit in single bytes/octets. >> >Yes, I realise that making everything a string before I start might be >the 'right' way to do things but one is a bit limited by what the mail >handling modules in Python provide.
I do ok, though most of my message processing happens to messages already landed in my "spool" Maildir by getmail. My setup uses getmail to get messages with POP into a single Maildir, and then I process the message files from there. >E.g. in this case the only (well the only ready made) way to get a >POP3 message is using poplib and this just gives you a list of lines >made up of "bytes as text" :- > > popmsg = pop3.retr(i+1) Ok, so you have bytes? You need to know. >I join the lines to feed them into mailbox.mbox() to create a mbox I >can analyse and also a message which can be sent using SMTP. > >Should I be converting to string somewhere? I have not used poplib, but the Python email modules have a BytesParser, which gets you a Message object; I would feed the poplib bytes to that to parse the received message. A Message object can then be transcribed as text via its .as_string method. Or you can do other things with it. I think my main points are: - know whether you're using bytes (uninterpreted data) or text (strings of _characters_); treating bytes _as_ text implies an encoding, and when that assumption is incorrect you get mojibake[1] - look at the email modules' parsers, which return Messages, a representation of the message in a structure (so that MIME subparts etc are correctly broken out, and the character sets are _known_, post parse) [1] https://en.wikipedia.org/wiki/Mojibake Cheers, Cameron Simpson <c...@cskk.id.au> -- https://mail.python.org/mailman/listinfo/python-list