> On Tue, 25 Nov 2003, Robert Menschel wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hello Aaron, >> >> Tuesday, November 25, 2003, 8:58:58 AM, you wrote: >> >> AY> ... Recently I started getting a lot of false positives with SA >> 2.60. >> AY> I noticed that all my mail was getting a bayesian score of 99 to >> AY> 100%. ...My best guess is that since the bayes database only holds a >> AY> limited number of tokens, my DB was filling up with spam tokens and >> AY> not enough non-spam tokens. Maybe this happened because I only get >> AY> about 10-20 legitimate emails a week versus about 100+ spam emails a >> AY> day. >> >> In November to date, I've trained my Bayes on 683 ham and 6816 spam. >> Ratio therefore seems to be about the same as yours. I haven't seen any >> evidence of the problem -- Bayes is working wonderfully here. >> >> Bob Menschel > > Having had an experience similar to Aaron's I can believe that he could > be having problems with a poisoned Bayes. For example, suppose that you've > received a large number of "Nigerian" spams that were learned as such. > That would put spam scores on a large number of converstational words. > > In a fit of pique, I had tossed a whole bunch of "Nigerian" spams in > my bayes. It got so bad that a test email that contained only one word > ("Hi") got a Bayes 99% spam score. I had to trash the DB and start from > scratch. > > So the quality of Bayes scoring does depend upon how it is trained. > It is a tool not a magic bullet, and like any tool can be misused > or abused. Spammers seem to be learning this, I'm seeing an increasing > number of spams that contain "Bayes poison".
IMHO Nigerian style spams will always be "Bayes poison" simply due to the nature of the wording of the messages being so similar to normal conversational text compared to "ordinary" spam which tends to have words that aren't normally used. Right back when 2.50 came out and introduced Bayes support I was one of the first people that commented on this list that Nigerian spams seemed to be the achillies heel of Bayes and it still seems to be the case with 2.60. The main problem I notice with them is despite repeated manual and automatic training on them, nigerian spams still frequently get either a neutral bayes score (giving 0 points) and quite often a very hammy bayes score giving them enough negative points to offset the positive points given by the nigerian tests, so that bayes often *prevents* a nigerian spam from being detected. I think at the time I suggested that if the nigerian tests fired that negative bayes scores be ignored, but the idea was probably considered too much of a hack. Regards, Simon ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk