[EMAIL PROTECTED] wrote:
> Hi list,
> 
> I'm currently trying to build up a new bayes DB here, since the autobuilt
> DB fubared (as expected, no need to throw things at me ;)). It's rather 
> easy
> to build up the spam part, as we are getting right enough of it, yet it 
> poses
> a problem to build up the ham part.

Generally, no problem. SA deals pretty well with wild imbalances in training.
I'm currently running with a 9:1 spam:ham training ratio. In the past I've had
as bad as 20:1 with no ill effects on scoring.

I'd try to get as close to 1:1 as you can, but don't kill yourself to get there.

If your training is small I would at least try to make sure you cover as broad a
range of your ham mail as possible. If all your ham training only reflects the
typical content of a small portion of the ham mail could have some problems.

Reply via email to