[EMAIL PROTECTED] wrote: > Can you just feed spamassassin spam or do you need to give it ham also? > > I read the docs and it didn't say you had to feed it ham. > > I then read another doc and it said you should feed it equal amounts of > spam and ham.
Yes, you really should feed it both. You also should strive for a 1:1 ratio of spam and nonspam, but don't kill yourself to get there. SA's use of chi-squared combining makes it very tolerant of wild imbalances in training. However, the closer you are to a 1:1 ratio the better SA will be able to distinguish tokens that are present in both kinds of mail and ignore them. So this is a worthwhile goal to strive for as long as it doesn't become a burden. My current training ratio is about 7:1 spam:nonspam, but in the past it's been as bad as 20:1. Both of those are very far off from equal amounts, but the imbalance has never caused me any problems. >From my sa-learn --dump magic output as of today: 0.000 0 995764 0 non-token data: nspam 0.000 0 145377 0 non-token data: nham That works out to a ratio of 6.85:1