I was aware of the stuff you're pointing out below. This is basically caused by using the new evolver to do the scoring. Previously, scores were limited to the range 0.01-5, now they are unlimited, and allowed to go -ve. A side effect of this is that rules which are really non-discriminators end up sometimes getting odd-looking scores. For example, CYBER_FIRE_POWER is just not likely to really be worth -4.020 if looked at in isolation, but it turns out that the 10 messages in the corpus which trigger that rule also trigger about a billion other ones.
Y 11 /home/craig/spams/ion.spam.2/1536 NO_REAL_NAME,INVALID_DATE_TZ_ABSURD,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING Y 12 /home/craig/spams/ion.spam.2/1058 NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 12 /home/craig/spams/ion.spam.2/659 NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 12 /home/craig/spams/ion.spam.2/660 NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 12 /home/craig/spams/ion.spam.2/661 NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 12 /home/craig/spams/ion.spam.2/662 NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 13 /home/craig/spams/ion.spam.2/1033 NO_REAL_NAME,INVALID_DATE,MSG_ID_ADDED_BY_MTA_2,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 13 /home/craig/spams/ion.spam.2/808 NO_REAL_NAME,INVALID_DATE,MSG_ID_ADDED_BY_MTA_2,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 14 /home/craig/spams/ion.spam.2/519 NO_REAL_NAME,ALL_CAPS_SUBJECT,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME Y 15 /home/craig/spams/ion.spam.2/372 NO_REAL_NAME,ALL_CAPS_SUBJECT,INVALID_DATE,MSG_ID_ADDED_BY_MTA_2,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME The same occurs with the other ones that are like that. This isn't really a problem. It can actually be helpful too to allow the GA to do its own thing -- take the DEAR_SOMEBODY rule. You might think it was a sign of spam. Well, turns out it occurs frequently in nonspam too (2027 in spam vs 2600 in nonspam) (note that's out of 200,000 nonspams and about 50,000 spams); but even though it's 4 times more common in spam than nonspam, it turns out it's useful as a discriminator for reducing false positives. Similarly with CASHCASHCASH. I would suggest letting go luke, and trusting the force. Uh, I mean the GA. If you try it and it isn't working well, then by all means, revert. I think you'll probably end up with worse results though. C Bart Schaefer wrote: > Date: Wed, 27 Feb 2002 15:39:15 -0800 (PST) > From: Bart Schaefer <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Subject: [SAtalk] Troubling new scores in 2.1 release > > I've diffed the r1.37 and r1.38 rules/50_scores.cf and some of the changes > are so unbelievable that I've decided not to install the new scores file. > Here's just a sampling: > r1.37 r1.38 > ----- ------- > score 25FREEMEGS_URL 1.00 -4.606 > score A_HREF_TO_OPT_OUT 1.0 6.675 > score A_HREF_TO_REMOVE 1.82 6.546 > score A_HREF_TO_UNSUB 0.01 4.102 > score BE_AMAZED 1.03 -4.581 > score BILLION_DOLLARS 0.90 4.249 > score CASHCASHCASH 1.64 -3.700 > score CYBER_FIRE_POWER 1.21 -4.020 > score DEAR_SOMEBODY 1.0 -4.412 > score DIFFERENT_REPLY_TO 0.90 4.067 > score EARN_PER_WEEK 2.0 7.273 > score EXCUSE_4 1.91 6.866 > score EXCUSE_5 2.20 13.447 > score EXCUSE_7 0.01 7.370 > score EXCUSE_13 0.01 7.484 > score EXCUSE_15 0.01 5.490 > score IN_REP_TO -2.0 -13.472 > score LOTS_OF_CC_LINES 1.00 -1.648 > score MAILTO_WITH_SUBJ_REMOVE 0.01 -3.661 > score NONEXISTENT_CHARSET 1.31 7.198 > score ONCE_IN_LIFETIME 0.80 -4.604 > score THIS_AINT_SPAM 2.17 -2.449 > score TO_NO_USER 3.12 -0.032 > score TRACKER_ID 0.71 -4.899 > score UNSUB_PAGE 1.21 10.767 > score UNSUB_SCRIPT 2.17 9.686 > score X_PMFLAGS_PRESENT 1.00 10.248 > score TO_UNSUB_REPLY 1.81 -2.290 > > Previously I don't think there was ever a GA-determined score above 5.0? > Certainly there weren't any above 9.0 (or below -9.0). > > Not that I don't appreciate Craig's efforts with the new GA, but ... I'd > go so far as to suggest that the distribution be reverted to rev 1.37 of > the rules/50_scores.cf file until this can be straightened out. > > > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > > _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk