=?iso-8859-1?Q?Jean-S=E9bastien_Guay-Leroux?= writes: > What is the reason for Bayes in spamassassin to use the 150 most significant > tokens in a email if Paul Graham mentions that you only should use the > fifteen most significant ?
It got better results in empirical testing. Check back through the SpamAssassin-devel archives (waaay back ;) for the details. I think there may be comments in Bayes.pm detailing the message headers of the message discussing it. (should have used a bugzilla bug in retrospect.) All the bayes tweaks we do are tested first against a corpus, using 10-fold cross validation (similar to the bogofilter and spambayes protocol). Mucking about with bayes without testing is just silly, since testing is so easy ;) --j. ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk