-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Keith C. Ivey writes: > Arlo Gilbert <[EMAIL PROTECTED]> wrote: > > > it would appear from the data im seeing that bayes is learning > > the to and received headers on mails... obviously this seems a > > bit redundant and will only add to the size of the bayes db, > > without contributing anything (maybe even harming the learning?) > > of the bayes engine. > > The Bayesian learning uses the last two "Received" lines and > the "To" line, but I don't see why you think those are > redundant. For example, if no spam has ever been sent to a > particular address before, then its presence in the "To" line > is a pretty good indicator that the message is not spam, > whereas the presence of the "info@" address for your domain (or > some address that's been retired because it gets too much spam) > may be a fair indicator that the message is spam. Remember > that the "To" line can contain multiple addresses. > > Also, the "To" line can contain comments, names, and completely > bogus addresses that spammers put in, and tokens found there > can be significant indicators of spam. Yep. One important point is that Received and To headers often contain forged data inserted by spam tools, and these are often very useful tokens. e.g. a Windows-style non-RFC-822-compliant date format, with "AM"/"PM" tokens, in Received headers, is a good spam-sign. > And the size of the Bayes DB is limited anyway, with the least > significant tokens being purged periodically, so the added > tokens aren't increasing the size. > > The people who developed the Bayes tokenizing for SA have done > analysis on how effective various strategies are, and I'm > inclined to trust their analysis unless you have some better > analysis that refutes it. Yeah -- every tweak to bayes gets a 10-fold cross-validation testing run, to see if it helps or not. Sometimes they do, sometimes it doesn't -- which can be counter-intuitive until you examine the results closely. The reports on these runs can be found on the SpamAssassin-devel list archives -- months ago unfortunately ;) - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Exmh CVS iD8DBQE/kxL3QTcbUG5Y7woRAiEiAJ0SgEobMrK4Or5YlME9u0z6J0sZRgCg4b4u qv8g00D+C3oYPRwk7UKjaKE= =ft6Q -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo The Event For Linux Datacenter Solutions & Strategies in The Enterprise Linux in the Boardroom; in the Front Office; & in the Server Room http://www.enterpriselinuxforum.com _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk