From: Troy Settle <[EMAIL PROTECTED]>
   Date: Tue, 18 Nov 2008 15:19:56 -0500
   
   Kai Schaetzl wrote:
   > Troy Settle wrote on Mon, 17 Nov 2008 13:33:10 -0500:
   >
   >> I'm having a major problem with the bayes system.  I cleared the bayes 
   >> database and let it start re-learning.  Once it kicked in, I again 
   >> started getting false hits with BAYES_00=-2.599 on a great many spam/uce 
   >> messages.

How many and what percentage spam messages are getting BAYES_00?
A few spam messages getting BAYES_00/05/20 is ok.  If you are getting
a large percentage of spam hitting BAYES_00  then you have
some sort of problem with the messages that are being learned.  Most
likely you are (auto)learning spam messages as ham.  Any mistakes made
in learning need to be corrected by relearning those messages.  Any
spam message that has autolearn=ham has to be relearned as spam.
Or perhaps you are not learning from enough spam messages.

For spam messages getting BAYES_00 what do you get for the following:
 spamassassin -D --test-mode --debug all,bayes < msg.txt 2>&1 | grep bayes:
Which spammy looking tokens have low values?

   > How did you "let it start re-learning"? What's the output of sa-learn dump 
   > magic?
    From incoming mail.  I'm still working on building a corpus suitable 
   for sa-learn.
   
   $ sa-learn --dump magic
   0.000          0          3          0  non-token data: bayes db version
   0.000          0      44946          0  non-token data: nspam
   0.000          0      36757          0  non-token data: nham
   0.000          0     545675          0  non-token data: ntokens
...

You should probably increase the size of the Bayes database, eg
 bayes_expiry_max_db_size 2000000
   
   FWIW, how bad would I screw things up if I were to override the BAYES_00 
   score to 0?

With proper training this should not be necessary.  Also, 0 would
disable the test, so you won't get any BAYES_00 hits.  A small
temporary non zero score would be better so you can continue to
track the problem.

-jeff

Reply via email to