Re: [SAtalk] Bayes mis-learning problem

Arpi Tue, 20 Jan 2004 14:50:53 -0800

Hi,

> On Mon, Jan 19, 2004 at 03:21:06PM -0500, Larry Gilson wrote:
> > http://useast.spamassassin.org/doc/Mail_SpamAssassin_Conf.html#learning%20op
> > tions
> > 
> > bayes_ignore_header header_name
> 
> ::bangs head on wall::   How did I miss *that*?  Thanks for correcting
> my careless reading.
> 
> In a broader sense though, shouldn't fields like To: be excluded by
> default?  It seems like if I receive more than 50% spam, this is a
> receipe for disaster.  Of course, some spam won't have a valid To:
> field, but it seems like constant things like this will be very bad
> arbitors.


Although I agree that this Bayes behaviour on To: is good, this thread
brought up an interesting problem in me:
Does the bayes calculation takes spam:ham ration into account?

So, if I have a constant header line (word), present in every spam and every
ham message, but i get 10 times more spam than ham (so the counters on this
word are 10 times bigger in spam column than in ham column), then bayes
will think this word means 10:1 spam probability? Which is bad, of course!!
As it does mean nothing, it's equally means it's spam as it's ham.

And we all have some constant headers, just think of the Received:
line including your mail server name/ip...

I wonder if bayes DB normalizes the spam/ham counts, by the number
of total spam/ham counters? Then it would find that my word is
present in 100% of all spam messages, and 100% of all ham messages,
so it means 50% spam probability (instead of 10:1 which means 90%)


A'rpi / Astral & ESP-team

--
Developer of MPlayer G2, the Movie Framework for all - http://www.MPlayerHQ.hu


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Bayes mis-learning problem

Reply via email to