On Fri, 05 Mar 2010 18:39:25 +0100
Kai Schaetzl <mailli...@conactive.com> wrote:

> Alex wrote on Fri, 5 Mar 2010 11:02:35 -0500:
> 
> > I've trained probably 50 of these, yet they still have BAYES_50.
> 
> I trained your example and it went from 50 to 99. With *1* message!
> There may be something wrong with your Bayes. With 400.000 tokens in
> the db.

There's nothing odd about that, it's common that hard to learn spam is
identified correctly on retesting. 

The first time there are only weak tokens and tokens that aren't in the
database (and mostly wont be seen again). The second time you have weak
tokens plus dozens of new spam hapaxes which dominate the result.  

Reply via email to