On Tue, 13 Feb 2018 21:02:46 +0000
Horváth Szabolcs wrote:

> One more question: is there a recommended ham to spam ratio? 1:1? 

No, this is a myth.  Bayes computes token probabilities from a token's 
frequencies in spam and ham, so it all scales through. If you have
2000 ham and 200 spam the problem is too few spams, not a bad ratio.


Theoretically there is a case for new training to match the ratio that's
already in the database because then a new token will get a token
probability that reflects its frequencies in recent mail. But I wouldn't
worry about that, it's hard to stick to, and probably minor. 

Reply via email to