RE: [SAtalk] Some spam values

Michael Moncur Fri, 27 Sep 2002 14:15:38 -0700

> >Some other "negative" scores that I find odd:
> >
> >SPAM: LOW_PRICE          (-1.2 points) BODY: Lowest Price
> >SPAM: HTML_FONT_COLOR_RED (-1.2 points) BODY: HTML font color is red
> >SPAM: BIG_FONT           (-0.4 points) BODY: FONT Size +2 and up
> or 3 and up


The 2.42 scores for these are:
score LOW_PRICE                      0.301
score HTML_FONT_COLOR_RED            0.319
score BIG_FONT                       0.315

> I understand that the scores are generated by a genetic algorithm that
scans a test
> archive of spams and derives the scores -- but that doesn't mean that a
little
> seat-of-the-pants intuition by the administrator can't come into play. :-)
I grepped
> for all the negatives and overrode them with positive scores in my
local.cf file.  In
> each case I tried to pick a score that "made sense" although I freely
admit that I
> could be off base in many cases.  Truth is I'm still in the  process of
tuning them so
> that I don't get false positives.

I used to do the same thing with any really out-of-bounds scores. Keep in
mind, first of all, that many of the scores were meant to be negative,
non-spam signs. Second, here's something to think about. After I made lots
of local changes to the 2.41 scores, I ran a mass-check on my own spam
corpus to test it, and invariably, I got *more* false positives.

More importantly, the GA has been fixed and the 2.42 scores solve most of
the problems you're talking about.

I even have a script that analyzes the GA results and calculates a
reasonable score for each rule based on how much spam and nonspam it
matched, compares that with the GA-assigned score, and gives me a list of
new scores to override any suspicious scores.

When I ran this script on the 2.42 scores, there were virtually no results -
most of the scores are right where they belong. Additionally, they give me
better results on my test corpus than any manually-corrected scores. Thus
I'm currently not correcting any of the GA scores.

> The point is, these are the ones that were negative in the stock
> scores that really seem like they should be positive.

These are virtually all fixed in the new scores, and using those will get
you much better results than making up your own scores. Wait for the 2.42
release in a few days and you'll be happy.

--
Michael Moncur  mgm at starlingtech.com  http://www.starlingtech.com/
"Fortune does not change men, it unmasks them." --Suzanne Necker



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

RE: [SAtalk] Some spam values

Reply via email to