Re: Zero-width rules?

RW Wed, 26 Jun 2019 15:06:09 -0700

On Wed, 26 Jun 2019 14:07:00 -0600
Amir Caspi wrote:

> John et al,
> 
> I recall from a prior thread last year that there were supposed to be
> some rules to check for zero-width joiner characters... but I'm
> seeing spams recently that have these, but don't hit any such rules.
> 
> Here's one spample, where the ZWJ entity #x200B is being used to try
> to sidestep Bayes detection of highly spammy words.
> https://pastebin.com/kx0jVBtZ


It's actually a zero-width space.

I created a second version with the ZWSs globally stripped, and ran both
through Bayes and diffed the tokens that contributed to the result.
YMMV, but I found it made no difference at all. 

With the previous run of ZW[N]J spams I found that they actually helped
Bayes by breaking-up words into fragments that repeated in related
spams.

Re: Zero-width rules?

Reply via email to