On Sat, 15 Jun 2002, Michael Moncur wrote:

> When a new release comes out I like to be anal-retentive and go through
> the GA second-guessing its scores. This is my report for 2.30.

In a similar vein, here are the significant score changes since the last
CVS version before the GA was re-run.  (I haven't de-duped against
Michael's list.)  My script defines "significant" as having had either a
change of sign or a change of more than 1.0 in absolute value.

These look suspicious:

score: ASCII_FORM_ENTRY 0.036 -> -1.660
score: BUGZILLA_BUG -2.000 -> 0.921
score: DATE_MISSING 0.248 -> -2.140
score: EXCUSE_16 1.345 -> -0.721
score: FORGED_HOTMAIL_RCVD 0.530 -> -0.356
score: FROM_AND_TO_SAME 0.877 -> -2.071
score: FROM_NAME_NO_SPACES 0.500 -> -0.114
score: GREEN_EXCUSE_1 3.116 -> -2.019
score: INTL_EXEC_GUILD 0.781 -> -0.039
score: MONEY_BACK 1.489 -> -0.239
score: MONEY_MAKING 2.490 -> -0.687
score: MSGID_CHARS_WEIRD 1.500 -> -2.178
score: NO_REAL_NAME 0.632 -> -1.068
score: X_NOT_PRESENT 0.500 -> -1.920

(How does BUGZILLA_BUG keep creeping back into the GA?)

These are good because I had to fix them last time.  (The GAPPY rules are
cumulative because the subject gets copied into the body, though I still
would have expected them to score higher.)

score: GAPPY_SUBJECT 2.670 -> 0.975
score: GAPPY_TEXT -1.237 -> 0.393
score: JAVASCRIPT_URI -1.607 -> 0.099

These are just interesting:

score: ASKS_BILLING_ADDRESS 2.627 -> -0.152
score: BE_AMAZED -0.260 -> 4.202
score: CTYPE_JUST_HTML 3.154 -> 1.665
score: LINES_OF_YELLING 0.453 -> -0.036
score: LINES_OF_YELLING_3 -1.518 -> 0.478
score: MAILTO_TO_REMOVE 1.341 -> -1.669
score: MAILTO_WITH_SUBJ -0.310 -> 1.900
score: MIME_NULL_BLOCK 0.157 -> -0.975
score: SLIGHTLY_UNSAFE_JAVASCRIPT -0.794 -> 0.693
score: SUBJ_ALL_CAPS 1.933 -> -0.054
score: SUPERLONG_LINE -0.374 -> 0.384
score: TO_BE_REMOVED_REPLY -2.150 -> 3.985
score: TO_UNSUB_REPLY -1.996 -> 3.366
score: TRACKER_ID -4.215 -> 4.332
score: X_ESMTP 1.000 -> -1.662

How did these get exactly 1.0?  Not represented in the corpus at all?

score: FORGED_RCVD_TRAIL absent -> 1.000
score: FROM_ADDRESS_EQ_REAL absent -> 1.000
score: TO_ADDRESS_EQ_REAL absent -> 1.000

And here's the remainder:

score: ADVERT_CODE 1.060 -> 4.725
score: ALL_CAPS_HEADER -0.274 -> 1.624
score: ALL_NATURAL 1.024 -> 2.291
score: AMAZING 1.631 -> 2.764
score: AUTO_EMAIL_REMOVAL 3.044 -> 1.750
score: BILLION_DOLLARS 1.072 -> 3.060
score: CALL_NOW 2.196 -> 4.732
score: CASHCASHCASH 0.182 -> 2.394
score: CASINO 1.605 -> 0.566
score: CBYI 2.502 -> 4.336
score: CHARSET_FARAWAY_HEADERS 1.817 -> 3.119
score: CHECK_OR_MONEY_ORDER 3.358 -> 4.465
score: CLICK_TO_REMOVE_2 2.634 -> 0.394
score: COMMUNIGATE 1.573 -> 2.596
score: COPY_ACCURATELY 0.933 -> 3.410
score: DATE_IN_FUTURE_03_06 0.072 -> 3.416
score: DATE_IN_FUTURE_06_12 0.145 -> 2.385
score: DATE_IN_FUTURE_12_24 0.290 -> 3.308
score: DATE_IN_FUTURE_24_48 0.580 -> 3.657
score: DATE_IN_FUTURE_48_96 1.159 -> 2.887
score: DATE_IN_FUTURE_96_XX 2.318 -> 3.463
score: DATE_IN_PAST_06_12 0.145 -> 1.642
score: DATE_IN_PAST_12_24 0.290 -> 1.703
score: DATE_IN_PAST_24_48 0.580 -> 2.611
score: DATE_IN_PAST_48_96 1.159 -> 2.479
score: DATE_IN_PAST_96_XX 2.318 -> 0.938
score: DATE_WARNING 1.0 -> 3.088
score: DEAR_EMAIL 0.500 -> 2.350
score: DEAR_FRIEND 2.069 -> 3.092
score: DIRECT_EMAIL 2.283 -> 3.294
score: DOMAIN_BODY 4.782 -> 2.491
score: EXCUSE_11 0.501 -> 1.680
score: EXCUSE_13 1.841 -> 0.150
score: EXCUSE_17 2.608 -> 3.700
score: EXCUSE_2 1.340 -> 3.073
score: EXCUSE_3 2.747 -> 1.080
score: EXCUSE_4 0.864 -> 2.896
score: EXCUSE_6 -0.110 -> 2.180
score: FAKED_IP_IN_RCVD 1.101 -> 3.647
score: FORGED_EUDORAMAIL_RCVD 2.548 -> 4.585
score: FORGED_GW05_RCVD 2.857 -> 4.851
score: FORGED_JUNO_RCVD 2.027 -> 4.430
score: FOR_JUST_SOME_AMT 0.783 -> 1.916
score: FREE_PRIORITY_MAIL 2.749 -> 0.718
score: FROM_MALFORMED 2.221 -> 3.667
score: FROM_MISSING 1.298 -> 4.257
score: FROM_STARTS_WITH_NUMS 1.288 -> 4.388
score: FROM_TOPICA 1.0 -> 4.573
score: GENTLE_FEROCITY 2.458 -> 4.196
score: HR_3113 0.900 -> 2.695
score: HTML_WITH_BGCOLOR -0.546 -> 1.346
score: HTTP_CTRL_CHARS_HOST 1.4 -> 3.077
score: HTTP_NUMBER_WORD 0.825 -> 1.827
score: HTTP_USERNAME_USED 0.203 -> 1.512
score: HTTP_WITH_EMAIL_IN_URL 4.062 -> 2.648
score: INCREASE_SOMETHING 2.744 -> 0.399
score: INVALID_DATE 1.514 -> 0.475
score: INVALID_DATE_TZ_ABSURD 2.126 -> 4.395
score: INVESTOR_SPEC_SHEET 1.235 -> 3.046
score: IN_REP_TO -4.431 -> -3.382
score: KOREAN_UCE_SUBJECT 0.500 -> 3.401
score: LARGE_HEX 1.108 -> 2.500
score: LIMITED_TIME_ONLY 0.841 -> 4.303
score: MAILTO_WITH_SUBJ_REMOVE 1.864 -> 4.922
score: MAJORDOMO 1.296 -> 0.129
score: MANY_FROMS 4.409 -> 2.897
score: MIME_EXCESSIVE_QP 0.750 -> 2.430
score: MIME_MISSING_BOUNDARY 1.500 -> 3.887
score: MIME_ODD_CASE 1.500 -> 4.339
score: MONSTERHUT 2.657 -> 1.295
score: MSG_ID_ADDED_BY_MTA 0.982 -> 3.967
score: MYCASINOBUILDER 3.870 -> 2.686
score: NEW_DOMAIN_EXTENSIONS 2.527 -> 4.255
score: NIGERIAN_SCAM_5 1.0 -> 2.333
score: NIGERIAN_SCAM_6 1.000 -> 3.014
score: NIGERIAN_SCAM_7 1.000 -> 4.228
score: NONEXISTENT_CHARSET 1.968 -> 4.523
score: NO_EXPERIENCE -1.063 -> 4.239
score: NO_QS_ASKED -0.773 -> 2.413
score: NO_SELLING 3.816 -> 1.489
score: ONCE_IN_LIFETIME 0.562 -> 1.802
score: ONE_HUNDRED_PC_FREE 2.397 -> 3.422
score: ONE_HUNDRED_PC_GUAR 4.399 -> 0.587
score: OPPORTUNITY -1.010 -> 1.072
score: PGP_SIGNATURE -2.095 -> -3.135
score: PGP_SIGNATURE_2 -2.000 -> -3.135
score: POPLAUNCH 3.633 -> 2.069
score: PORN_1 1.053 -> 3.092
score: PORN_7 0.814 -> 2.631
score: PORN_8 -4.248 -> 0.547
score: PRINT_FORM_SIGNATURE 1.862 -> 4.558
score: PROFITS 0.460 -> 1.809
score: PURE_PROFIT 2.777 -> 4.129
score: RATWARE -0.703 -> 4.563
score: READ_TO_END 3.882 -> 1.998
score: REAL_THING -0.148 -> 2.074
score: RELAYING_FRAME -0.584 -> 0.031
score: REMOVE_PAGE 3.500 -> 2.198
score: REMOVE_SUBJ 2.345 -> 3.435
score: REPLY_TO_EMPTY 4.335 -> 2.595
score: RESISTANCE_IS_FUTILE 4.948 -> 3.515
score: SECTION_301 0.116 -> 2.823
score: SHORT_RECEIVED_LINE 1.805 -> 4.251
score: SOCIAL_SEC_NUMBER 2.305 -> 3.329
score: SPAM_FORM 3.243 -> 0.218
score: SPAM_PHRASES_020 1.0 -> 2.139
score: SPAM_PHRASES_040 1.0 -> 2.424
score: STOCK_PICK 2.671 -> 3.781
score: SUBJ_2_CREDIT 2.824 -> 4.198
score: SUBJ_FULL_OF_8BITS 3.136 -> 4.298
score: SUBJ_HAS_SPACES 2.741 -> 4.043
score: SUBJ_MISSING 2.428 -> 1.399
score: SUBJ_REMOVE 0.062 -> -0.823
score: TONER 3.014 -> 1.403
score: TO_EMPTY 2.541 -> 1.446
score: TO_NO_USER 1.928 -> 4.528
score: TRACE_BY_SSN 2.013 -> 0.545
score: UNDISC_RECIPS 1.164 -> 4.034
score: UNNEEDED_HTML_ENCODING 2.183 -> 3.196
score: UNSUB_SCRIPT 0.420 -> 2.018
score: UPPERCASE_25_50 0.400 -> 1.937
score: UPPERCASE_50_75 0.500 -> 2.972
score: UPPERCASE_75_100 0.500 -> 2.990
score: URI_IS_POUND 0.869 -> 2.026
score: US_DOLLARS 2.429 -> 4.475
score: VERY_SUSP_RECIPS 2.436 -> 1.024
score: VIAGRA 2.383 -> 4.666
score: VJESTIKA 2.807 -> 4.279
score: WANTS_CREDIT_CARD 1.532 -> 4.288
score: WORK_AT_HOME 0.365 -> 3.163
score: WWW_CLIK4YOU_COM 1.0 -> 2.093
score: X_ANTIABUSE 1.000 -> 2.370
score: X_ENC_PRESENT 1.000 -> 4.140
score: X_LIST_HOST 1.000 -> 2.079
score: X_LIST_UNSUBSCRIBE 1.000 -> 3.195
score: X_MAILER_GIBBERISH 0.905 -> 3.062
score: X_MAIL_ID_PRESENT 1.000 -> 2.192
score: X_PRECEDENCE_REF 1.000 -> 4.555
score: X_SERV_HOST_PRESENT 1.000 -> 3.759
score: X_SMTPEXP_VERSION 1.000 -> 3.243
score: X_STORMPOST_TO 1.000 -> 2.272
score: YELLOWSUN 1.783 -> 2.810
score: YOU_HAVE_BEEN_SELECTED 2.606 -> 4.221
score: X_OSIRU_DUL_FH absent -> -1.5
score: FUDGE_DUL_MAPS_OSIRU absent -> -2.0
score: FUDGE_RELAY_OSIRU absent -> -2.0
score: FUDGE_OSIRU_FH absent -> 1.5
score: FUDGE_MULTIHOP_RELAY absent -> -1.0
score: EXCUSE_ES_04 absent -> 1.0

Amusing anecdote in case you get this far:  I recently had to whitelist
several friends because they were discussing the minutes of the local
school board meeting.  The budget numbers triggered the Nigerian scam
rules and the sex-ed discussion set off the PORN rules.  There's a case
where rule intersection analysis might have been helpful -- there's
probably not much Nigerian porn priced at millions of dollars.


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to