Re: milter to decode quoted-printable, base64, ...

Bill Cole Wed, 16 Nov 2016 08:50:22 -0800

On 16 Nov 2016, at 0:42, Michael Fox wrote:

[...]

Yup. But if the original message content is all plain text, then theencoding adds no value and can be removed without changing themessage.


That is a critical factor.

It is entirely feasible to slice everything other than text/plain partsoff of a multipart/{mixed,alternative} and reinject the remnant. Anideal tool for that is MIMEDefang, a milter that is often used as analternative to Amavis (as a hub for anti-malware and anti-spamfiltering,) but at its core is a toolkit for message manipulation andtransformation. If you can define what you want to do in thetransformation from encoded text to plain text to handle non-obviouscases as Perl code, you can do it in MIMEDefang and not have to code theplumbing yourself. MD uses the MIME-tools suite of Perl modules which ismaintained by the same author (Dianne Skoll) so if you do pick it asyour base tool for this, you'll already have a trivially easy-to-use wayto decode text/plain parts encoded by Base64 or QP. Of course, onceyou've got a blob of decoded "text" (maybe in Latin-1 or UTF-8) youwould then need to squash it down into a mail-safe form, for which"groff -T ascii" is your friend (if you befriend berserker vandals...)

I presume I need a content-filter to perform this work post-queue.

If you did this with MD or any other milter, the model would be todiscard the original message pre-queue (i.e. have Postfix "accept" themessage in SMTP but not queue it) and re-inject the transformed message.

One actually should only do anything like this with client-side
software. You presumably intend to throw away information (such asthe
difference between o, ô, and ö)
Yes. Although the likelihood of such characters in the originalcontent is virtually nil in this application. And, even if it doesexist, such characters can't be used by the receiving client anyway.

OK, so there are tools like groff that will squash extended 8-bitsupersets of ASCII into ASCII in a lossy manner. If you understand thereal degree of damage that may do and can accept it on a known low-riskinput stream, who am I to judge?

FWIW, I've done this sort of text normalization on a large messycollection of mixed text, html, and pdf files, many of which were at onepoint email. if you want to do an ideal transformation of everything itis insanely complex. If you can tolerate substantial damage to decodednon-ASCII input, "groff -T ascii" will do it and just drop non-ASCIIcharacters.

and it is best to allow those choices
to remain with end users.
Generally true. But not in this case. The client is what it is. SoI either find a way to decode such messages externally beforedelivering them to the client, or else the messages can't be read atall (at least the base64 type).


OK, assuming that you understand what you're doing...

Solve whatever problem you are trying to solve in
some other way.
I understand and appreciate what you're saying as a general rule. ButI also understand this particular application. And for thisparticular application, recovering the original plain text messagebefore sending to the client is what's needed.

That raises an alternative option: if there *is* an "original plain textmessage" which something else is encoding, maybe the better approach isto fix the busybody encoder.

But thanks for your thoughts, Bill. Your postings on this list arealways informative.


Thank you. I try.

Re: milter to decode quoted-printable, base64, ...

Reply via email to