On 5 Jul 2007, [EMAIL PROTECTED] stated: > On 7/2/2007, "Nix" <[EMAIL PROTECTED]> wrote: > > >>If you wanted to replace all other scoring mechanisms with the Bayes DB, >>you'd need a second Bayes DB for this, anyway, or you'd need the tokens >>corresponding to typically negative-scoring rules to have values which >>cannot appear in the body of an email. Anything else would enable spammers >>to force both FPs and FNs by customizing spam appropriately to include >>suitable NO_FOO/YES_FOO values. > > That's why the data is being passed in as a second reference, nothing to > do with the message. Seems to be working well, but there's some > optimization to include.
It doesn't just need to be a second reference. The tokens need to be independent of the message-derived tokens in the Bayes database itself as well: i.e., it needs to be impossible for spammers to generate tokens in the message body which can be used to influence the scores of the tokens in the Bayes DB which correspond to the Bayes-scored rule hits. (btw, Tom, what's wrong with your mailer? ^M characters --- CRCRLF line terminators on the wire, perhaps? --- a doubled-up Subject line, and two To: lines, one with fullnames, one without... I cleaned up the ^Ms in this response.) -- `... in the sense that dragons logically follow evolution so they would be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep furiously