[EMAIL PROTECTED] wrote: > Hi list, > > I'm currently trying to build up a new bayes DB here, since the autobuilt > DB fubared (as expected, no need to throw things at me ;)). It's rather > easy > to build up the spam part, as we are getting right enough of it, yet it > poses > a problem to build up the ham part.
Generally, no problem. SA deals pretty well with wild imbalances in training. I'm currently running with a 9:1 spam:ham training ratio. In the past I've had as bad as 20:1 with no ill effects on scoring. I'd try to get as close to 1:1 as you can, but don't kill yourself to get there. If your training is small I would at least try to make sure you cover as broad a range of your ham mail as possible. If all your ham training only reflects the typical content of a small portion of the ham mail could have some problems.