pm...@email.it a écrit : > Hi, in this page: http://wiki.apache.org/spamassassin/BayesInSpamAssassin > i read: * > > "Do not* train Bayes on different mail streams or public spam corpora. > These methods will mislead Bayes into believing certain tokens are > spammy or hammy when they are not." > > So, i can't learn external spam database with sa-learn? (for exampls : > http://untroubled.org/spam/ ) >
you could, but experience suggests that you get better results by only training on _your_ mail, and only on mail that you verified manually. That said, it is unclear whether this experience applies to everybody. In any case, you should train on both spam and ham. I would conjecture (but I have no proof) that the ratio of these should more or less match the ratio of spam and ham in your mail. achieving this with external mail is hard. external spam and ham would be interesting if we had a second Bayes engine that is only applied to "unsure" mail (to push the score far from the threshold).