I just went through and took a close look at the scores. I think they turned
out very well - the GA and I agree on just about everything this time. Bart
already mentioned some of these, but here is a list of scores I find
questionable.
First, three definite problems:
>score PORN_8 -4.248
I think this rule has become nearly useless ("mp3z" and "videoz" and "warez"
are probably almost in common usage now) but it certainly isn't a non-spam
indicator of this magnitude. This one was -0.9 before. It should probably be
thrown out.
>score TRACKER_ID -4.215
I can't understand the regex here, but I think it's broken. If it's really
being this much of a non-spam indicator it must be detecting something other
than tracking IDs. (This one was already -3.3 in 2.1. Wasn't it intended as
a spam indicator in the first place?)
>score BUGZILLA_BUG 1.123
I know this one was intended to be negative. The regex is probably detecting
lots of things that aren't Bugzilla reports and likely needs fixing.
Second, a few scores that seem awfully high. I'm keeping these because my
threshold is 7.0, but I'd be nervous about them with a 5.0 threshold.
score DOMAIN_BODY 4.782
score EARN_PER_WEEK 4.667
score FRONTPAGE 4.775
score MANY_FROMS 4.409
score ONE_HUNDRED_PC_GUAR 4.399
score WE_HONOR_ALL 4.536
(My opinion: if the default threshold is 5.0 no score should be above about
3.3. With the current scores, as with 2.1, a threshold of 7.0 works quite
well.)
Third, a list of scores that should be positive but are low negatives,
probably indicating that the rules are no longer useful or broken. I
wouldn't really consider any of them good non-spam indicators. I'm setting
them all to zero in my local.cf file.
score ALL_CAPS_SUBJECT -0.274
score BE_AMAZED -0.260
score GAPPY_TEXT -1.237
score HTML_WITH_BGCOLOR -0.546
score JAVASCRIPT_URI -1.607
score LINES_OF_YELLING_3 -1.518
score NO_EXPERIENCE -1.063
score NO_QS_ASKED -0.773
score OPPORTUNITY -1.010
score RATWARE -0.703
score REAL_THING -0.148
score RELAYING_FRAME -0.584
score SLIGHTLY_UNSAFE_JAVASCRIPT -0.794
score SUPERLONG_LINE -0.374
score SUBJ_ENDS_IN_Q_MARK -0.050
score SUSPICIOUS_RECIPS -0.016
score TO_BE_REMOVED_REPLY -2.150
score TO_UNSUB_REPLY -1.996
score WEB_BUGS -0.823
score X_MSMAIL_PRIORITY_HIGH -1.356
--
michael moncur mgm at starlingtech.com http://www.starlingtech.com/
"Women who seek to be equal with men lack ambition." -- Timothy Leary
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk