I was aware of the stuff you're pointing out below.  This is basically caused by 
using the new evolver to do the scoring.  Previously, scores were limited to the 
range 0.01-5, now they are unlimited, and allowed to go -ve.  A side effect of 
this is that rules which are really non-discriminators end up sometimes getting 
odd-looking scores.  For example, CYBER_FIRE_POWER is just not likely to really 
be worth -4.020 if looked at in isolation, but it turns out that the 10 messages 
in the corpus which trigger that rule also trigger about a billion other ones.

Y 11 /home/craig/spams/ion.spam.2/1536 
NO_REAL_NAME,INVALID_DATE_TZ_ABSURD,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING
Y 12 /home/craig/spams/ion.spam.2/1058 
NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 12 /home/craig/spams/ion.spam.2/659 
NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 12 /home/craig/spams/ion.spam.2/660 
NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 12 /home/craig/spams/ion.spam.2/661 
NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 12 /home/craig/spams/ion.spam.2/662 
NO_REAL_NAME,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 13 /home/craig/spams/ion.spam.2/1033 
NO_REAL_NAME,INVALID_DATE,MSG_ID_ADDED_BY_MTA_2,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 13 /home/craig/spams/ion.spam.2/808 
NO_REAL_NAME,INVALID_DATE,MSG_ID_ADDED_BY_MTA_2,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 14 /home/craig/spams/ion.spam.2/519 
NO_REAL_NAME,ALL_CAPS_SUBJECT,INVALID_DATE,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME
Y 15 /home/craig/spams/ion.spam.2/372 
NO_REAL_NAME,ALL_CAPS_SUBJECT,INVALID_DATE,MSG_ID_ADDED_BY_MTA_2,EXCUSE_3,REMOVE_SUBJ,REMOVE_IN_QUOTES,ADDRESSES_ON_CD,MONEY_MAKING,ASKS_BILLING_ADDRESS,WANTS_CREDIT_CARD,CYBER_FIRE_POWER,LINE_OF_YELLING,FROM_AND_TO_SAME

The same occurs with the other ones that are like that.

This isn't really a problem.  It can actually be helpful too to allow the GA to 
do its own thing -- take the DEAR_SOMEBODY rule.  You might think it was a sign 
of spam.  Well, turns out it occurs frequently in nonspam too (2027 in spam vs 
2600 in nonspam) (note that's out of 200,000 nonspams and about 50,000 spams); 
but even though it's 4 times more common in spam than nonspam, it turns out it's 
useful as a discriminator for reducing false positives.  Similarly with 
CASHCASHCASH.

I would suggest letting go luke, and trusting the force.  Uh, I mean the GA.  If 
you try it and it isn't working well, then by all means, revert.  I think you'll 
probably end up with worse results though.

C

Bart Schaefer wrote:

> Date: Wed, 27 Feb 2002 15:39:15 -0800 (PST)
> From: Bart Schaefer <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Subject: [SAtalk] Troubling new scores in 2.1 release
> 
> I've diffed the r1.37 and r1.38 rules/50_scores.cf and some of the changes 
> are so unbelievable that I've decided not to install the new scores file.
> Here's just a sampling:
>                                     r1.37      r1.38
>                                     -----    -------
> score 25FREEMEGS_URL                 1.00     -4.606
> score A_HREF_TO_OPT_OUT              1.0       6.675
> score A_HREF_TO_REMOVE               1.82      6.546
> score A_HREF_TO_UNSUB                0.01      4.102
> score BE_AMAZED                      1.03     -4.581
> score BILLION_DOLLARS                0.90      4.249
> score CASHCASHCASH                   1.64     -3.700
> score CYBER_FIRE_POWER               1.21     -4.020
> score DEAR_SOMEBODY                  1.0      -4.412
> score DIFFERENT_REPLY_TO             0.90      4.067
> score EARN_PER_WEEK                  2.0       7.273
> score EXCUSE_4                       1.91      6.866
> score EXCUSE_5                       2.20     13.447
> score EXCUSE_7                       0.01      7.370
> score EXCUSE_13                      0.01      7.484
> score EXCUSE_15                      0.01      5.490
> score IN_REP_TO                     -2.0     -13.472
> score LOTS_OF_CC_LINES               1.00     -1.648
> score MAILTO_WITH_SUBJ_REMOVE        0.01     -3.661
> score NONEXISTENT_CHARSET            1.31      7.198
> score ONCE_IN_LIFETIME               0.80     -4.604
> score THIS_AINT_SPAM                 2.17     -2.449
> score TO_NO_USER                     3.12     -0.032
> score TRACKER_ID                     0.71     -4.899
> score UNSUB_PAGE                     1.21     10.767
> score UNSUB_SCRIPT                   2.17      9.686
> score X_PMFLAGS_PRESENT              1.00     10.248
> score TO_UNSUB_REPLY                 1.81     -2.290
> 
> Previously I don't think there was ever a GA-determined score above 5.0?
> Certainly there weren't any above 9.0 (or below -9.0).
> 
> Not that I don't appreciate Craig's efforts with the new GA, but ... I'd 
> go so far as to suggest that the distribution be reverted to rev 1.37 of
> the rules/50_scores.cf file until this can be straightened out.  
> 
> 
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 
> 
> 


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to