Troy Settle wrote on Tue, 18 Nov 2008 15:19:56 -0500: > From incoming mail.
well, but how? By auto-learning? In that case you are just multiplying your problem. It seems a lot of spam gets miscategorized as ham. Auto-learning that spam as ham means enforcing this miscategorization and that's what you see as a result. > 0.000 0 44946 0 non-token data: nspam > 0.000 0 36757 0 non-token data: nham > 0.000 0 545675 0 non-token data: ntokens looking fine if the ham tokens were really ham. > 0.000 0 1227007705 0 non-token data: last expiry atime > 0.000 0 393274 0 non-token data: last expire > reduction count Hm, you just did an expire that slashed your db almost in half? You may want to let it grow a bit. > > FWIW, how bad would I screw things up if I were to override the BAYES_00 > score to 0? As it is causing you grief now, probably not much. It means that real ham that also gets detected as Bayes_00 will not enjoy the benefits of this negative score. Maybe switching Bayes off for a while is better. I would start over with that db. 1. stop Bayes and check how the categorization without Bayes works, by theory you should have a good number of miscategorized spam (as ham) already without Bayes. 2. collect some ham and spam where you can be absolutely sure that they are in the right category and then train Bayes with these. Stop autolearning for bayes for a while. 3. switch it on with your new db and check if Bayes seems to categorize better now 4. if it does then switch auto-learning on, but move the auto-learning threshold for ham a bit down, so that the chance of spam creeping in is smaller. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com