Re: Learning Bayes

mouss Sun, 14 Mar 2010 13:54:45 -0700

pm...@email.it a écrit :
> Hi, in this page: http://wiki.apache.org/spamassassin/BayesInSpamAssassin
> i read: *
> 
> "Do not* train Bayes on different mail streams or public spam corpora.
> These methods will mislead Bayes into believing certain tokens are
> spammy or hammy when they are not."
> 
> So, i can't learn external spam database with sa-learn?  (for exampls :
> http://untroubled.org/spam/ )
>



you could, but experience suggests that you get better results by only
training on _your_ mail, and only on mail that you verified manually.
That said, it is unclear whether this experience applies to everybody.

In any case, you should train on both spam and ham. I would conjecture
(but I have no proof) that the ratio of these should more or less match
the ratio of spam and ham in your mail. achieving this with external
mail is hard.

external spam and ham would be interesting if we had a second Bayes
engine that is only applied to "unsure" mail (to push the score far from
 the threshold).

Re: Learning Bayes

Reply via email to