On 6/26/2019 4:07 PM, Amir Caspi wrote:
> John et al,
>
> I recall from a prior thread last year that there were supposed to be some 
> rules to check for zero-width joiner characters... but I'm seeing spams 
> recently that have these, but don't hit any such rules.
>
> Here's one spample, where the ZWJ entity #x200B is being used to try to 
> sidestep Bayes detection of highly spammy words.
> https://pastebin.com/kx0jVBtZ
>
> I know there are legitimate uses for ZWJ chars in some scripts, so we can't 
> use their mere existence as evidence of spam, but presumably those charsets 
> would be denoted explicitly so could be meta'd out... and, presumably, would 
> also not contain the ZWJ chars in between obviously roman chars.
>
> Any idea why this spample didn't hit the ZWJ obfuscation rules?
>
> I'm getting quite a lot of zero-hour snowshoe spam lately and it's not even 
> hitting BOGUS_MIME_VERSION any more... almost all of it is BAYES_50.  Some of 
> that is because of these ZWJ tricks, though the rest, I dunno -- I'm still 
> not sure if my DB is misbehaving or if these spams are just carefully crafted 
> enough to avoid highly-spammy words.  (Training and rescanning gets me 
> BAYES_99 on those same spams so the DB is definitely training...)
>
> Thoughts on nuking these Bayes-evading stealthy spams?
>
> Thanks!
>
> --- Amir
>
Amir, are you using KAM.cf?

The sample you sent isn't encoded with a charset that will do anything
with ​.  I think it's a literal string of "​" because the
email is just plain text.  So maybe a rule to hit on that is needed?  I
added one to KAM.cf if you want to let me know if it helps.

Regards,
KAM

-- 
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

Reply via email to