John et al, I recall from a prior thread last year that there were supposed to be some rules to check for zero-width joiner characters... but I'm seeing spams recently that have these, but don't hit any such rules.
Here's one spample, where the ZWJ entity #x200B is being used to try to sidestep Bayes detection of highly spammy words. https://pastebin.com/kx0jVBtZ I know there are legitimate uses for ZWJ chars in some scripts, so we can't use their mere existence as evidence of spam, but presumably those charsets would be denoted explicitly so could be meta'd out... and, presumably, would also not contain the ZWJ chars in between obviously roman chars. Any idea why this spample didn't hit the ZWJ obfuscation rules? I'm getting quite a lot of zero-hour snowshoe spam lately and it's not even hitting BOGUS_MIME_VERSION any more... almost all of it is BAYES_50. Some of that is because of these ZWJ tricks, though the rest, I dunno -- I'm still not sure if my DB is misbehaving or if these spams are just carefully crafted enough to avoid highly-spammy words. (Training and rescanning gets me BAYES_99 on those same spams so the DB is definitely training...) Thoughts on nuking these Bayes-evading stealthy spams? Thanks! --- Amir