Re: Bayes underperforming, HTML entities?

Amir Caspi Fri, 09 Nov 2018 10:27:21 -0800

On Nov 9, 2018, at 7:41 AM, RW <rwmailli...@googlemail.com> wrote:
> 
> I was really referring to the fact that it's pure ASCII text that's
> being encoded rather than long runs per se


That is true for the current batch of messages, but as we've seen, spammers 
love to use unicode obfuscation to try to foil Bayes and other filters... hence 
why I figured we should be proactive and catch all HTML encoding.

> but you may well be right that long runs are inherently suspicious, I'm
> not very familiar with HTML practices.

AFAIK there is no good or sane reason to ever encode readable Roman-character 
text in this way, or likely any character set.  If characters can be embedded 
"as is" (i.e., without encoding) in the email, then they will be properly 
interpreted and rendered in HTML without any need for encoding.  Encoding only 
adds bulk and obfuscation, and I can see no legitimate reason to encode 
language in this way.

Apparently John's masscheck run last night would seem to agree. =)

Cheers!

--- Amir

Re: Bayes underperforming, HTML entities?

Reply via email to