On 6/26/2019 4:07 PM, Amir Caspi wrote: > John et al, > > I recall from a prior thread last year that there were supposed to be some > rules to check for zero-width joiner characters... but I'm seeing spams > recently that have these, but don't hit any such rules. > > Here's one spample, where the ZWJ entity #x200B is being used to try to > sidestep Bayes detection of highly spammy words. > https://pastebin.com/kx0jVBtZ > > I know there are legitimate uses for ZWJ chars in some scripts, so we can't > use their mere existence as evidence of spam, but presumably those charsets > would be denoted explicitly so could be meta'd out... and, presumably, would > also not contain the ZWJ chars in between obviously roman chars. > > Any idea why this spample didn't hit the ZWJ obfuscation rules? > > I'm getting quite a lot of zero-hour snowshoe spam lately and it's not even > hitting BOGUS_MIME_VERSION any more... almost all of it is BAYES_50. Some of > that is because of these ZWJ tricks, though the rest, I dunno -- I'm still > not sure if my DB is misbehaving or if these spams are just carefully crafted > enough to avoid highly-spammy words. (Training and rescanning gets me > BAYES_99 on those same spams so the DB is definitely training...) > > Thoughts on nuking these Bayes-evading stealthy spams? > > Thanks! > > --- Amir > Amir, are you using KAM.cf?
The sample you sent isn't encoded with a charset that will do anything with ​. I think it's a literal string of "​" because the email is just plain text. So maybe a rule to hit on that is needed? I added one to KAM.cf if you want to let me know if it helps. Regards, KAM -- Kevin A. McGrail Member, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171