> >Some other "negative" scores that I find odd: > > > >SPAM: LOW_PRICE (-1.2 points) BODY: Lowest Price > >SPAM: HTML_FONT_COLOR_RED (-1.2 points) BODY: HTML font color is red > >SPAM: BIG_FONT (-0.4 points) BODY: FONT Size +2 and up > or 3 and up
The 2.42 scores for these are: score LOW_PRICE 0.301 score HTML_FONT_COLOR_RED 0.319 score BIG_FONT 0.315 > I understand that the scores are generated by a genetic algorithm that scans a test > archive of spams and derives the scores -- but that doesn't mean that a little > seat-of-the-pants intuition by the administrator can't come into play. :-) I grepped > for all the negatives and overrode them with positive scores in my local.cf file. In > each case I tried to pick a score that "made sense" although I freely admit that I > could be off base in many cases. Truth is I'm still in the process of tuning them so > that I don't get false positives. I used to do the same thing with any really out-of-bounds scores. Keep in mind, first of all, that many of the scores were meant to be negative, non-spam signs. Second, here's something to think about. After I made lots of local changes to the 2.41 scores, I ran a mass-check on my own spam corpus to test it, and invariably, I got *more* false positives. More importantly, the GA has been fixed and the 2.42 scores solve most of the problems you're talking about. I even have a script that analyzes the GA results and calculates a reasonable score for each rule based on how much spam and nonspam it matched, compares that with the GA-assigned score, and gives me a list of new scores to override any suspicious scores. When I ran this script on the 2.42 scores, there were virtually no results - most of the scores are right where they belong. Additionally, they give me better results on my test corpus than any manually-corrected scores. Thus I'm currently not correcting any of the GA scores. > The point is, these are the ones that were negative in the stock > scores that really seem like they should be positive. These are virtually all fixed in the new scores, and using those will get you much better results than making up your own scores. Wait for the 2.42 release in a few days and you'll be happy. -- Michael Moncur mgm at starlingtech.com http://www.starlingtech.com/ "Fortune does not change men, it unmasks them." --Suzanne Necker ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk