On Tue, Feb 07, 2006 at 01:45:48PM -0800, jdow wrote: > From: "Jim C. Nasby" <[EMAIL PROTECTED]> > > >On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote: > >>My current training ratio is about 7:1 spam:nonspam, but in the past it's > >>been > >>as bad as 20:1. Both of those are very far off from equal amounts, but the > >>imbalance has never caused me any problems. > >> > >>From my sa-learn --dump magic output as of today: > >>0.000 0 995764 0 non-token data: nspam > >>0.000 0 145377 0 non-token data: nham > > > >Interesting... it appears I actually need to do a better job of training > >spam! > >sa-learn --dump magic|grep am > >0.000 0 98757 0 non-token data: nspam > >0.000 0 255134 0 non-token data: nham > > > >I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what > >that does... > > If you have the option manually train the spam for awhile. If the threshold > is set too low for autolearning spam you will find yourself with a mangled > database that has a high percentage of actual ham learned as spam. That is > not a good thing. You might actually lower the ham threshold, as well. It > looks like you might be at risk of learning spam as ham. (And in fact may > have done this already to a high degree.)
See my other reply, which showed stats for all spam over 5 this month. The stats for last month are: grep -r autolearn oldspam/ | grep -v 'Binary file' | sed -e 's/.*autolearn=\([^ ]*\).*/\1/' | sort | uniq -c 5862 no 1225 spam 24 unavailable So based on this, I'd think it's not learning spam as ham... BTW, autolearn ham should be at it's default setting... What's interesting is that I get about 10-20 spams a day that are scored below 3, and another 30-50 a day that are between 3 and 5 (which go to my 'probablespam' folder). I send all of these to sa via spamassassin -r, so I would have thought that I'd have far more spam in the database than ham... -- Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"