Re: corpus ham/spam balance

RW Wed, 26 Aug 2009 11:57:26 -0700

On Wed, 26 Aug 2009 11:04:42 -0600
"Savoy, Jim" <sa...@uleth.ca> wrote:


 not sure how much effect adding AWL to my config helped with my
> corpus coming into balance (perhaps it was only the change to my ham
> threshold that made the difference), 

The AWL score isn't counted for autolearning and neither is the the
Bayes score or whitelisting rules. So by taking the threshold down to
-3.0, you wont be learning much ham at all unless you have a lot of good
negative scoring custom rules.

IMO trying to find a particular ham threshold that brings the ratio
into balance is not a good idea because you can become far too selective
in what you learn, and end-up learning the least useful candidates.
Probably better to periodically push the threshold down to -100 until
the numbers balance.

Re: corpus ham/spam balance

Reply via email to