Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

RW Wed, 14 Feb 2018 07:09:58 -0800

On Tue, 13 Feb 2018 21:02:46 +0000
Horváth Szabolcs wrote:

> One more question: is there a recommended ham to spam ratio? 1:1?


No, this is a myth.  Bayes computes token probabilities from a token's 
frequencies in spam and ham, so it all scales through. If you have
2000 ham and 200 spam the problem is too few spams, not a bad ratio.


Theoretically there is a case for new training to match the ratio that's
already in the database because then a new token will get a token
probability that reflects its frequencies in recent mail. But I wouldn't
worry about that, it's hard to stick to, and probably minor.

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

Reply via email to