Matus UHLAR - fantomas wrote: > > however he must run sa-learn on hams too, otherwise he may get false > positives soon... > True. I was merely commenting on why it is a good idea to allow mail SA has already tagged to be trained. I did not intend to imply this should be your sole source of training. > The most effective is probably to run sa-learn on false positives and false > negatives. > The most effective is to run sa-learn on nonspam and spam. Don't restrict your training to FPs and FNs. (or did you, like me, mean training FPs and FNs as a supplement to more general training?)
In general, it creates bias in your bayes database when you create any kind of artificial restrictions on what you will or will not train, so it is best to avoid them where possible. Your decisions should really just be "do I consider it spam or not?" Train accordingly. It's just that simple. The only area I might consider biasing my training in would be in your spam to nonspam ratio. SpamAssassin "ideally" works best with a 50/50 training mix, but is quite tolerant of severe deviations from this. (99/1 is more common). If your ratio is severely off, as most folks are, you might want to apply a *little* extra effort to get more nonspam training. But don't spend a lot of time obsessing over it, I've never seen one so imbalanced that it actually caused problems. In general, its more important to have fresh training than well balanced training. As long as there's a reasonably fresh feed of both spam and nonspam, you should be fine.