=?iso-8859-1?Q?Jean-S=E9bastien_Guay-Leroux?= writes:
> What is the reason for Bayes in spamassassin to use the 150 most significant
> tokens in a email if Paul Graham mentions that you only should use the
> fifteen most significant ?

It got better results in empirical testing.  Check back through the
SpamAssassin-devel archives (waaay back ;) for the details.

I think there may be comments in Bayes.pm detailing the message headers of
the message discussing it.  (should have used a bugzilla bug in
retrospect.)

All the bayes tweaks we do are tested first against a corpus, using
10-fold cross validation (similar to the bogofilter and spambayes
protocol).  Mucking about with bayes without testing is just silly,
since testing is so easy ;)

--j.


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to