Arlo Gilbert <[EMAIL PROTECTED]> wrote: > it would appear from the data im seeing that bayes is learning > the to and received headers on mails... obviously this seems a > bit redundant and will only add to the size of the bayes db, > without contributing anything (maybe even harming the learning?) > of the bayes engine.
The Bayesian learning uses the last two "Received" lines and the "To" line, but I don't see why you think those are redundant. For example, if no spam has ever been sent to a particular address before, then its presence in the "To" line is a pretty good indicator that the message is not spam, whereas the presence of the "info@" address for your domain (or some address that's been retired because it gets too much spam) may be a fair indicator that the message is spam. Remember that the "To" line can contain multiple addresses. Also, the "To" line can contain comments, names, and completely bogus addresses that spammers put in, and tokens found there can be significant indicators of spam. And the size of the Bayes DB is limited anyway, with the least significant tokens being purged periodically, so the added tokens aren't increasing the size. The people who developed the Bayes tokenizing for SA have done analysis on how effective various strategies are, and I'm inclined to trust their analysis unless you have some better analysis that refutes it. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC ------------------------------------------------------- This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo The Event For Linux Datacenter Solutions & Strategies in The Enterprise Linux in the Boardroom; in the Front Office; & in the Server Room http://www.enterpriselinuxforum.com _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk