Re: sa-learn

Matt Kettler Fri, 26 Oct 2007 04:19:06 -0700

Matus UHLAR - fantomas wrote:
>
> however he must run sa-learn on hams too, otherwise he may get false
> positives soon...
>   
True. I was merely commenting on why it is a good idea to allow mail SA
has already tagged to be trained. I did not intend to imply this should
be your sole source of training.
> The most effective is probably to run sa-learn on false positives and false
> negatives.
>   
The most effective is to run sa-learn on nonspam and spam. Don't
restrict your training to FPs and FNs. (or did you, like me, mean
training FPs and FNs as a supplement to more general training?)


In general, it creates bias in your bayes database when you create any
kind of artificial restrictions on what you will or will not train, so
it is best to avoid them where possible. Your decisions should really
just be "do I consider it spam or not?" Train accordingly. It's just
that simple.

The only area I might consider biasing my training in would be in your
spam to nonspam ratio. SpamAssassin "ideally" works best with a 50/50
training mix, but is quite tolerant of severe deviations from this.
(99/1 is more common). If your ratio is severely off, as most folks are,
you might want to apply a *little* extra effort to get more nonspam
training. But don't spend a lot of time obsessing over it, I've never
seen one so imbalanced that it actually caused problems. In general, its
more important to have fresh training than well balanced training. As
long as there's a reasonably fresh feed of both spam and nonspam, you
should be fine.

Re: sa-learn

Reply via email to