Chris Green ha scritto:
I have a custom mail filter in python that uses the mailbox package to
open a mail message and give me access to the headers.
So I have the following code to open each mail message:-
#
#
# Read the message from standard input and make a message object from it
#
msg = mailbox.MaildirMessage(sys.stdin.buffer.read())
and then later I have (among many other bits and pieces):-
#
#
# test for string in Subject:
#
if searchTxt in str(msg.get("subject", "unknown")):
do
various
things
This works exactly as intended most of the time but occasionally a
message whose subject should match the test is missed. I have just
realised when this happens, it's when the Subject: has accented
characters in it (this is from a mailing list about canals in France).
So, for example, the latest case of this happening has:-
Subject: aka Marne à la Saône (Waterways Continental Europe)
where the searchTxt in the code above is "Waterways Continental Europe".
Is there any way I can work round this issue? E.g. is there a way to
strip out all extended characters from a string? Or maybe it's
msg.get() that isn't managing to handle the accented string correctly?
Yes, I know that accented characters probably aren't allowed in
Subject: but I'm not going to get that changed! :-)
Hi,
you could try extracting the "Content-Type:charset" and then using it
for subject conversion:
subj = str(raw_subj, encoding='...')
--
https://mail.python.org/mailman/listinfo/python-list