On 16 Nov 2016, at 0:42, Michael Fox wrote:

[...]
Yup. But if the original message content is all plain text, then the encoding adds no value and can be removed without changing the message.

That is a critical factor.

It is entirely feasible to slice everything other than text/plain parts off of a multipart/{mixed,alternative} and reinject the remnant. An ideal tool for that is MIMEDefang, a milter that is often used as an alternative to Amavis (as a hub for anti-malware and anti-spam filtering,) but at its core is a toolkit for message manipulation and transformation. If you can define what you want to do in the transformation from encoded text to plain text to handle non-obvious cases as Perl code, you can do it in MIMEDefang and not have to code the plumbing yourself. MD uses the MIME-tools suite of Perl modules which is maintained by the same author (Dianne Skoll) so if you do pick it as your base tool for this, you'll already have a trivially easy-to-use way to decode text/plain parts encoded by Base64 or QP. Of course, once you've got a blob of decoded "text" (maybe in Latin-1 or UTF-8) you would then need to squash it down into a mail-safe form, for which "groff -T ascii" is your friend (if you befriend berserker vandals...)

I presume I need a content-filter to perform this work post-queue.

If you did this with MD or any other milter, the model would be to discard the original message pre-queue (i.e. have Postfix "accept" the message in SMTP but not queue it) and re-inject the transformed message.

One actually should only do anything like this with client-side
software. You presumably intend to throw away information (such as the
difference between o, ô, and ö)

Yes. Although the likelihood of such characters in the original content is virtually nil in this application. And, even if it does exist, such characters can't be used by the receiving client anyway.

OK, so there are tools like groff that will squash extended 8-bit supersets of ASCII into ASCII in a lossy manner. If you understand the real degree of damage that may do and can accept it on a known low-risk input stream, who am I to judge?

FWIW, I've done this sort of text normalization on a large messy collection of mixed text, html, and pdf files, many of which were at one point email. if you want to do an ideal transformation of everything it is insanely complex. If you can tolerate substantial damage to decoded non-ASCII input, "groff -T ascii" will do it and just drop non-ASCII characters.

and it is best to allow those choices
to remain with end users.

Generally true. But not in this case. The client is what it is. So I either find a way to decode such messages externally before delivering them to the client, or else the messages can't be read at all (at least the base64 type).

OK, assuming that you understand what you're doing...

Solve whatever problem you are trying to solve in
some other way.

I understand and appreciate what you're saying as a general rule. But I also understand this particular application. And for this particular application, recovering the original plain text message before sending to the client is what's needed.

That raises an alternative option: if there *is* an "original plain text message" which something else is encoding, maybe the better approach is to fix the busybody encoder.

But thanks for your thoughts, Bill. Your postings on this list are always informative.

Thank you. I try.

Reply via email to