Re: [SAtalk] Bayesian 100% on all my mail

Simon Byrnand Wed, 26 Nov 2003 01:51:11 -0800

> On Tue, 25 Nov 2003, Robert Menschel wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hello Aaron,
>>
>> Tuesday, November 25, 2003, 8:58:58 AM, you wrote:
>>
>> AY> ... Recently I started getting a lot of false positives with SA
>> 2.60.
>> AY>  I noticed that all my mail was getting a bayesian score of 99 to
>> AY> 100%. ...My best guess is that since the bayes database only holds a
>> AY> limited number of tokens, my DB was filling up with spam tokens and
>> AY> not enough non-spam tokens.  Maybe this happened because I only get
>> AY> about 10-20 legitimate emails a week versus about 100+ spam emails a
>> AY> day.
>>
>> In November to date, I've trained my Bayes on 683 ham and 6816 spam.
>> Ratio therefore seems to be about the same as yours. I haven't seen any
>> evidence of the problem -- Bayes is working wonderfully here.
>>
>> Bob Menschel
>
> Having had an experience similar to Aaron's I can believe that he could
> be having problems with a poisoned Bayes. For example, suppose that you've
> received a large number of "Nigerian" spams that were learned as such.
> That would put spam scores on a large number of converstational words.
>
> In a fit of pique, I had tossed a whole bunch of "Nigerian" spams in
> my bayes. It got so bad that a test email that contained only one word
> ("Hi") got a Bayes 99% spam score. I had to trash the DB and start from
> scratch.
>
> So the quality of Bayes scoring does depend upon how it is trained.
> It is a tool not a magic bullet, and like any tool can be misused
> or abused. Spammers seem to be learning this, I'm seeing an increasing
> number of spams that contain "Bayes poison".


IMHO Nigerian style spams will always be "Bayes poison" simply due to the
nature of the wording of the messages being so similar to normal
conversational text compared to "ordinary" spam which tends to have words
that aren't normally used.

Right back when 2.50 came out and introduced Bayes support I was one of
the first people that commented on this list that Nigerian spams seemed to
be the achillies heel of Bayes and it still seems to be the case with
2.60.

The main problem I notice with them is despite repeated manual and
automatic training on them, nigerian spams still frequently get either a
neutral bayes score (giving 0 points) and quite often a very hammy bayes
score giving them enough negative points to offset the positive points
given by the nigerian tests, so that bayes often *prevents* a nigerian
spam from being detected.

I think at the time I suggested that if the nigerian tests fired that
negative bayes scores be ignored, but the idea was probably considered too
much of a hack.

Regards,
Simon




-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Bayesian 100% on all my mail

Reply via email to