-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Arpi writes: >Hi, > >> On Mon, Jan 19, 2004 at 03:21:06PM -0500, Larry Gilson wrote: >> > http://useast.spamassassin.org/doc/Mail_SpamAssassin_Conf.html#learning%20op >> > tions >> > >> > bayes_ignore_header header_name >> >> ::bangs head on wall:: How did I miss *that*? Thanks for correcting >> my careless reading. >> >> In a broader sense though, shouldn't fields like To: be excluded by >> default? It seems like if I receive more than 50% spam, this is a >> receipe for disaster. Of course, some spam won't have a valid To: >> field, but it seems like constant things like this will be very bad >> arbitors. > >Although I agree that this Bayes behaviour on To: is good, this thread >brought up an interesting problem in me: >Does the bayes calculation takes spam:ham ration into account? > >So, if I have a constant header line (word), present in every spam and every >ham message, but i get 10 times more spam than ham (so the counters on this >word are 10 times bigger in spam column than in ham column), then bayes >will think this word means 10:1 spam probability? Which is bad, of course!! >As it does mean nothing, it's equally means it's spam as it's ham. > >And we all have some constant headers, just think of the Received: >line including your mail server name/ip... > >I wonder if bayes DB normalizes the spam/ham counts, by the number >of total spam/ham counters? Then it would find that my word is >present in 100% of all spam messages, and 100% of all ham messages, >so it means 50% spam probability (instead of 10:1 which means 90%) Hi Arpi -- yep, it does. That's why there's a total count of messages in either category in the nspam and nham counters in the db. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Exmh CVS iD8DBQFADbQkQTcbUG5Y7woRAla9AKDwS2rqoRk0q8/6jJYeC9ejA608AgCcDyOt nNGtn2IUVmnDT+iKKV2wl00= =aE6E -----END PGP SIGNATURE----- ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk