Re: 'hair' spam

RW Sun, 23 Jun 2013 09:30:24 -0700

On Sat, 22 Jun 2013 20:38:43 -0700 (PDT)
John Hardin wrote:

> On Sat, 22 Jun 2013, Jonathan Nichols wrote:
> 
> > What kind of worries me are the low hayes scores. I've been feeding 
> > fairly consistent message after message.
> 
> If it never leaves BAYES_50 then the training isn't being properly
> done. Are you sure you're training the bayes database that SA is
> using? What user are you running training as? What user is SA running
> as? Are you configured for per-user or site-wide bayes databases?


I'm finding they stick at Bayes_50 too, and when I ran his spam it
actually hit BAYES_00. In my case the reason is that I've had very
little pump & dump spam for years.  Bayes learns very slowly when it
needs to detune existing tokens.

Bogofilter is shooting fish in a barrel with these, but if you look at
the tokens, the hammy tokens that make it through the cut are mostly
single word whereas the spammy tokens are mostly multiword. If I switch
off multiword tokenization, the result for the linked spam changes from
1.00 to 0.32 - similar to Bayes.


I think a rule to catch the HA_I R variants should be be worth a point.

Re: 'hair' spam

Reply via email to