[EMAIL PROTECTED] wrote:
> Can you just feed spamassassin spam or do you need to give it ham also?
> 
> I read the docs and it didn't say you had to feed it ham.
> 
> I then read another doc and it said you should feed it equal amounts of
> spam and ham.

Yes, you really should feed it both. You also should strive for a 1:1 ratio of
spam and nonspam, but don't kill yourself to get there.

SA's use of chi-squared combining makes it very tolerant of wild imbalances in
training. However, the closer you are to a 1:1 ratio the better SA will be able
to distinguish tokens that are present in both kinds of mail and ignore them. So
this is a worthwhile goal to strive for as long as it doesn't become a burden.

My current training ratio is about 7:1 spam:nonspam, but in the past it's been
as bad as 20:1. Both of those are very far off from equal amounts, but the
imbalance has never caused me any problems.

>From my sa-learn --dump magic output as of today:
0.000          0     995764          0  non-token data: nspam
0.000          0     145377          0  non-token data: nham

That works out to a ratio of 6.85:1




Reply via email to