Re: [SAtalk] Evaluation of 2.30 GA scores

Daniel Quinlan Sat, 15 Jun 2002 14:28:37 -0700

Craig R Hughes writes:

> So it's just because the GA could get away with setting it to 0.921
> -- in practice it's a clear sign of nonspam, and we should just fix
> it at -2.0, which I've done on both branches now.


Okay.  In HEAD, I made the rule less apt to be abused which is just as
well since we're hard-coding the score negatively.

> [craig@belphegore masses]$ fgrep FROM_AND_TO_SAME freqs
>       3574 2877 697 FROM_AND_TO_SAME
> 
> So it's not a bad rule -- occurs 5 times as frequently in spam as nonspam.
> 
> FROM_AND_TO_SAME is triggered in 0.2% of the false positives in the corpus
> FROM_AND_TO_SAME is triggered in 3.1% of the false negatives in the corpus
> 
> which makes the score the GA calculated make perfect sense.

Okay, as you probably already noticed, I removed FROM_AND_TO_SAME
earlier this morning (bug 456).  Let me reopen that bug and I'll see
if I can find a way to improve the rule before I add it back.

For me, only 42% of the hits are spam although I'll grant you that
it's not part of any false positives, so I'll add it back either way.

Dan

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Evaluation of 2.30 GA scores

Reply via email to