On 16 Nov 2016, at 0:42, Michael Fox wrote:
[...]
Yup. But if the original message content is all plain text, then the
encoding adds no value and can be removed without changing the
message.
That is a critical factor.
It is entirely feasible to slice everything other than text/plain parts
off of a multipart/{mixed,alternative} and reinject the remnant. An
ideal tool for that is MIMEDefang, a milter that is often used as an
alternative to Amavis (as a hub for anti-malware and anti-spam
filtering,) but at its core is a toolkit for message manipulation and
transformation. If you can define what you want to do in the
transformation from encoded text to plain text to handle non-obvious
cases as Perl code, you can do it in MIMEDefang and not have to code the
plumbing yourself. MD uses the MIME-tools suite of Perl modules which is
maintained by the same author (Dianne Skoll) so if you do pick it as
your base tool for this, you'll already have a trivially easy-to-use way
to decode text/plain parts encoded by Base64 or QP. Of course, once
you've got a blob of decoded "text" (maybe in Latin-1 or UTF-8) you
would then need to squash it down into a mail-safe form, for which
"groff -T ascii" is your friend (if you befriend berserker vandals...)
I presume I need a content-filter to perform this work post-queue.
If you did this with MD or any other milter, the model would be to
discard the original message pre-queue (i.e. have Postfix "accept" the
message in SMTP but not queue it) and re-inject the transformed message.
One actually should only do anything like this with client-side
software. You presumably intend to throw away information (such as
the
difference between o, ô, and ö)
Yes. Although the likelihood of such characters in the original
content is virtually nil in this application. And, even if it does
exist, such characters can't be used by the receiving client anyway.
OK, so there are tools like groff that will squash extended 8-bit
supersets of ASCII into ASCII in a lossy manner. If you understand the
real degree of damage that may do and can accept it on a known low-risk
input stream, who am I to judge?
FWIW, I've done this sort of text normalization on a large messy
collection of mixed text, html, and pdf files, many of which were at one
point email. if you want to do an ideal transformation of everything it
is insanely complex. If you can tolerate substantial damage to decoded
non-ASCII input, "groff -T ascii" will do it and just drop non-ASCII
characters.
and it is best to allow those choices
to remain with end users.
Generally true. But not in this case. The client is what it is. So
I either find a way to decode such messages externally before
delivering them to the client, or else the messages can't be read at
all (at least the base64 type).
OK, assuming that you understand what you're doing...
Solve whatever problem you are trying to solve in
some other way.
I understand and appreciate what you're saying as a general rule. But
I also understand this particular application. And for this
particular application, recovering the original plain text message
before sending to the client is what's needed.
That raises an alternative option: if there *is* an "original plain text
message" which something else is encoding, maybe the better approach is
to fix the busybody encoder.
But thanks for your thoughts, Bill. Your postings on this list are
always informative.
Thank you. I try.