On 2023-05-08 23:02:18 +0200, jak wrote: > Peter J. Holzer ha scritto: > > On 2023-05-06 16:27:04 +0200, jak wrote: > > > Chris Green ha scritto: > > > > Chris Green <c...@isbd.net> wrote: > > > > > A bit more information, msg.get("subject", "unknown") does return a > > > > > string, as follows:- > > > > > > > > > > Subject: > > > > > =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?= > > [...] > > > > ... and of course I now see the issue! The Subject: with utf-8 > > > > characters in it gets spaces changed to underscores. So searching for > > > > '(Waterways Continental Europe)' fails. > > > > > > > > I'll either need to test for both versions of the string or I'll need > > > > to change underscores to spaces in the Subject: returned by msg.get(). [...] > > > > > > subj = email.header.decode_header(raw_subj)[0] > > > > > > subj[0].decode(subj[1]) [...] > > email.header.decode_header returns a *list* of chunks and you have to > > process and concatenate all of them. > > > > Here is a snippet from a mail to html converter I wrote a few years ago: > > > > def decode_rfc2047(s): > > if s is None: > > return None > > r = "" > > for chunk in email.header.decode_header(s): [...] > > r += chunk[0].decode(chunk[1]) [...] > > return r [...] > > > > I do have to say that Python is extraordinarily clumsy in this regard. > > Thanks for the reply. In fact, I gave that answer because I did > not understand what the OP wanted to achieve. In addition, the > OP opened a second thread on the similar topic in which I gave a > more correct answer (subject: "What do these '=?utf-8?' sequences > mean in python?", date: "Sat, 6 May 2023 14:50:40 UTC").
Right. I saw that after writing my reply. I should have read all messages, not just that thread before replying. > the OP, I discovered that the MAME is not the only format used > to compose the subject. Not sure what "MAME" is. If it's a typo for MIME, then the base64 variant of RFC 2047 is just as much a part of it as the quoted-printable variant. > This made me think that a library could not delegate to the programmer > the burden of managing all these exceptions, email.header.decode_header handles both variants, but it produces bytes sequences which still have to be decoded to get a Python string. > then I have further investigated to discover that the library also > provides the conversion function beyond that of coding and this makes > our labors vain: > > ---------- > from email.header import decode_header, make_header > > subject = make_header(decode_header( raw_subject ))) > ---------- Yup. I somehow missed that. That's a lot more convenient than calling decode in a loop (or generator expression). Depending on what you want to do with the subject you may have wrap that in a call to str(), but it's still a one-liner. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | h...@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list