On Nov 9, 2018, at 7:41 AM, RW <rwmailli...@googlemail.com> wrote: > > I was really referring to the fact that it's pure ASCII text that's > being encoded rather than long runs per se
That is true for the current batch of messages, but as we've seen, spammers love to use unicode obfuscation to try to foil Bayes and other filters... hence why I figured we should be proactive and catch all HTML encoding. > but you may well be right that long runs are inherently suspicious, I'm > not very familiar with HTML practices. AFAIK there is no good or sane reason to ever encode readable Roman-character text in this way, or likely any character set. If characters can be embedded "as is" (i.e., without encoding) in the email, then they will be properly interpreted and rendered in HTML without any need for encoding. Encoding only adds bulk and obfuscation, and I can see no legitimate reason to encode language in this way. Apparently John's masscheck run last night would seem to agree. =) Cheers! --- Amir