On Sat, 15 Jun 2002, Michael Moncur wrote: > When a new release comes out I like to be anal-retentive and go through > the GA second-guessing its scores. This is my report for 2.30.
In a similar vein, here are the significant score changes since the last CVS version before the GA was re-run. (I haven't de-duped against Michael's list.) My script defines "significant" as having had either a change of sign or a change of more than 1.0 in absolute value. These look suspicious: score: ASCII_FORM_ENTRY 0.036 -> -1.660 score: BUGZILLA_BUG -2.000 -> 0.921 score: DATE_MISSING 0.248 -> -2.140 score: EXCUSE_16 1.345 -> -0.721 score: FORGED_HOTMAIL_RCVD 0.530 -> -0.356 score: FROM_AND_TO_SAME 0.877 -> -2.071 score: FROM_NAME_NO_SPACES 0.500 -> -0.114 score: GREEN_EXCUSE_1 3.116 -> -2.019 score: INTL_EXEC_GUILD 0.781 -> -0.039 score: MONEY_BACK 1.489 -> -0.239 score: MONEY_MAKING 2.490 -> -0.687 score: MSGID_CHARS_WEIRD 1.500 -> -2.178 score: NO_REAL_NAME 0.632 -> -1.068 score: X_NOT_PRESENT 0.500 -> -1.920 (How does BUGZILLA_BUG keep creeping back into the GA?) These are good because I had to fix them last time. (The GAPPY rules are cumulative because the subject gets copied into the body, though I still would have expected them to score higher.) score: GAPPY_SUBJECT 2.670 -> 0.975 score: GAPPY_TEXT -1.237 -> 0.393 score: JAVASCRIPT_URI -1.607 -> 0.099 These are just interesting: score: ASKS_BILLING_ADDRESS 2.627 -> -0.152 score: BE_AMAZED -0.260 -> 4.202 score: CTYPE_JUST_HTML 3.154 -> 1.665 score: LINES_OF_YELLING 0.453 -> -0.036 score: LINES_OF_YELLING_3 -1.518 -> 0.478 score: MAILTO_TO_REMOVE 1.341 -> -1.669 score: MAILTO_WITH_SUBJ -0.310 -> 1.900 score: MIME_NULL_BLOCK 0.157 -> -0.975 score: SLIGHTLY_UNSAFE_JAVASCRIPT -0.794 -> 0.693 score: SUBJ_ALL_CAPS 1.933 -> -0.054 score: SUPERLONG_LINE -0.374 -> 0.384 score: TO_BE_REMOVED_REPLY -2.150 -> 3.985 score: TO_UNSUB_REPLY -1.996 -> 3.366 score: TRACKER_ID -4.215 -> 4.332 score: X_ESMTP 1.000 -> -1.662 How did these get exactly 1.0? Not represented in the corpus at all? score: FORGED_RCVD_TRAIL absent -> 1.000 score: FROM_ADDRESS_EQ_REAL absent -> 1.000 score: TO_ADDRESS_EQ_REAL absent -> 1.000 And here's the remainder: score: ADVERT_CODE 1.060 -> 4.725 score: ALL_CAPS_HEADER -0.274 -> 1.624 score: ALL_NATURAL 1.024 -> 2.291 score: AMAZING 1.631 -> 2.764 score: AUTO_EMAIL_REMOVAL 3.044 -> 1.750 score: BILLION_DOLLARS 1.072 -> 3.060 score: CALL_NOW 2.196 -> 4.732 score: CASHCASHCASH 0.182 -> 2.394 score: CASINO 1.605 -> 0.566 score: CBYI 2.502 -> 4.336 score: CHARSET_FARAWAY_HEADERS 1.817 -> 3.119 score: CHECK_OR_MONEY_ORDER 3.358 -> 4.465 score: CLICK_TO_REMOVE_2 2.634 -> 0.394 score: COMMUNIGATE 1.573 -> 2.596 score: COPY_ACCURATELY 0.933 -> 3.410 score: DATE_IN_FUTURE_03_06 0.072 -> 3.416 score: DATE_IN_FUTURE_06_12 0.145 -> 2.385 score: DATE_IN_FUTURE_12_24 0.290 -> 3.308 score: DATE_IN_FUTURE_24_48 0.580 -> 3.657 score: DATE_IN_FUTURE_48_96 1.159 -> 2.887 score: DATE_IN_FUTURE_96_XX 2.318 -> 3.463 score: DATE_IN_PAST_06_12 0.145 -> 1.642 score: DATE_IN_PAST_12_24 0.290 -> 1.703 score: DATE_IN_PAST_24_48 0.580 -> 2.611 score: DATE_IN_PAST_48_96 1.159 -> 2.479 score: DATE_IN_PAST_96_XX 2.318 -> 0.938 score: DATE_WARNING 1.0 -> 3.088 score: DEAR_EMAIL 0.500 -> 2.350 score: DEAR_FRIEND 2.069 -> 3.092 score: DIRECT_EMAIL 2.283 -> 3.294 score: DOMAIN_BODY 4.782 -> 2.491 score: EXCUSE_11 0.501 -> 1.680 score: EXCUSE_13 1.841 -> 0.150 score: EXCUSE_17 2.608 -> 3.700 score: EXCUSE_2 1.340 -> 3.073 score: EXCUSE_3 2.747 -> 1.080 score: EXCUSE_4 0.864 -> 2.896 score: EXCUSE_6 -0.110 -> 2.180 score: FAKED_IP_IN_RCVD 1.101 -> 3.647 score: FORGED_EUDORAMAIL_RCVD 2.548 -> 4.585 score: FORGED_GW05_RCVD 2.857 -> 4.851 score: FORGED_JUNO_RCVD 2.027 -> 4.430 score: FOR_JUST_SOME_AMT 0.783 -> 1.916 score: FREE_PRIORITY_MAIL 2.749 -> 0.718 score: FROM_MALFORMED 2.221 -> 3.667 score: FROM_MISSING 1.298 -> 4.257 score: FROM_STARTS_WITH_NUMS 1.288 -> 4.388 score: FROM_TOPICA 1.0 -> 4.573 score: GENTLE_FEROCITY 2.458 -> 4.196 score: HR_3113 0.900 -> 2.695 score: HTML_WITH_BGCOLOR -0.546 -> 1.346 score: HTTP_CTRL_CHARS_HOST 1.4 -> 3.077 score: HTTP_NUMBER_WORD 0.825 -> 1.827 score: HTTP_USERNAME_USED 0.203 -> 1.512 score: HTTP_WITH_EMAIL_IN_URL 4.062 -> 2.648 score: INCREASE_SOMETHING 2.744 -> 0.399 score: INVALID_DATE 1.514 -> 0.475 score: INVALID_DATE_TZ_ABSURD 2.126 -> 4.395 score: INVESTOR_SPEC_SHEET 1.235 -> 3.046 score: IN_REP_TO -4.431 -> -3.382 score: KOREAN_UCE_SUBJECT 0.500 -> 3.401 score: LARGE_HEX 1.108 -> 2.500 score: LIMITED_TIME_ONLY 0.841 -> 4.303 score: MAILTO_WITH_SUBJ_REMOVE 1.864 -> 4.922 score: MAJORDOMO 1.296 -> 0.129 score: MANY_FROMS 4.409 -> 2.897 score: MIME_EXCESSIVE_QP 0.750 -> 2.430 score: MIME_MISSING_BOUNDARY 1.500 -> 3.887 score: MIME_ODD_CASE 1.500 -> 4.339 score: MONSTERHUT 2.657 -> 1.295 score: MSG_ID_ADDED_BY_MTA 0.982 -> 3.967 score: MYCASINOBUILDER 3.870 -> 2.686 score: NEW_DOMAIN_EXTENSIONS 2.527 -> 4.255 score: NIGERIAN_SCAM_5 1.0 -> 2.333 score: NIGERIAN_SCAM_6 1.000 -> 3.014 score: NIGERIAN_SCAM_7 1.000 -> 4.228 score: NONEXISTENT_CHARSET 1.968 -> 4.523 score: NO_EXPERIENCE -1.063 -> 4.239 score: NO_QS_ASKED -0.773 -> 2.413 score: NO_SELLING 3.816 -> 1.489 score: ONCE_IN_LIFETIME 0.562 -> 1.802 score: ONE_HUNDRED_PC_FREE 2.397 -> 3.422 score: ONE_HUNDRED_PC_GUAR 4.399 -> 0.587 score: OPPORTUNITY -1.010 -> 1.072 score: PGP_SIGNATURE -2.095 -> -3.135 score: PGP_SIGNATURE_2 -2.000 -> -3.135 score: POPLAUNCH 3.633 -> 2.069 score: PORN_1 1.053 -> 3.092 score: PORN_7 0.814 -> 2.631 score: PORN_8 -4.248 -> 0.547 score: PRINT_FORM_SIGNATURE 1.862 -> 4.558 score: PROFITS 0.460 -> 1.809 score: PURE_PROFIT 2.777 -> 4.129 score: RATWARE -0.703 -> 4.563 score: READ_TO_END 3.882 -> 1.998 score: REAL_THING -0.148 -> 2.074 score: RELAYING_FRAME -0.584 -> 0.031 score: REMOVE_PAGE 3.500 -> 2.198 score: REMOVE_SUBJ 2.345 -> 3.435 score: REPLY_TO_EMPTY 4.335 -> 2.595 score: RESISTANCE_IS_FUTILE 4.948 -> 3.515 score: SECTION_301 0.116 -> 2.823 score: SHORT_RECEIVED_LINE 1.805 -> 4.251 score: SOCIAL_SEC_NUMBER 2.305 -> 3.329 score: SPAM_FORM 3.243 -> 0.218 score: SPAM_PHRASES_020 1.0 -> 2.139 score: SPAM_PHRASES_040 1.0 -> 2.424 score: STOCK_PICK 2.671 -> 3.781 score: SUBJ_2_CREDIT 2.824 -> 4.198 score: SUBJ_FULL_OF_8BITS 3.136 -> 4.298 score: SUBJ_HAS_SPACES 2.741 -> 4.043 score: SUBJ_MISSING 2.428 -> 1.399 score: SUBJ_REMOVE 0.062 -> -0.823 score: TONER 3.014 -> 1.403 score: TO_EMPTY 2.541 -> 1.446 score: TO_NO_USER 1.928 -> 4.528 score: TRACE_BY_SSN 2.013 -> 0.545 score: UNDISC_RECIPS 1.164 -> 4.034 score: UNNEEDED_HTML_ENCODING 2.183 -> 3.196 score: UNSUB_SCRIPT 0.420 -> 2.018 score: UPPERCASE_25_50 0.400 -> 1.937 score: UPPERCASE_50_75 0.500 -> 2.972 score: UPPERCASE_75_100 0.500 -> 2.990 score: URI_IS_POUND 0.869 -> 2.026 score: US_DOLLARS 2.429 -> 4.475 score: VERY_SUSP_RECIPS 2.436 -> 1.024 score: VIAGRA 2.383 -> 4.666 score: VJESTIKA 2.807 -> 4.279 score: WANTS_CREDIT_CARD 1.532 -> 4.288 score: WORK_AT_HOME 0.365 -> 3.163 score: WWW_CLIK4YOU_COM 1.0 -> 2.093 score: X_ANTIABUSE 1.000 -> 2.370 score: X_ENC_PRESENT 1.000 -> 4.140 score: X_LIST_HOST 1.000 -> 2.079 score: X_LIST_UNSUBSCRIBE 1.000 -> 3.195 score: X_MAILER_GIBBERISH 0.905 -> 3.062 score: X_MAIL_ID_PRESENT 1.000 -> 2.192 score: X_PRECEDENCE_REF 1.000 -> 4.555 score: X_SERV_HOST_PRESENT 1.000 -> 3.759 score: X_SMTPEXP_VERSION 1.000 -> 3.243 score: X_STORMPOST_TO 1.000 -> 2.272 score: YELLOWSUN 1.783 -> 2.810 score: YOU_HAVE_BEEN_SELECTED 2.606 -> 4.221 score: X_OSIRU_DUL_FH absent -> -1.5 score: FUDGE_DUL_MAPS_OSIRU absent -> -2.0 score: FUDGE_RELAY_OSIRU absent -> -2.0 score: FUDGE_OSIRU_FH absent -> 1.5 score: FUDGE_MULTIHOP_RELAY absent -> -1.0 score: EXCUSE_ES_04 absent -> 1.0 Amusing anecdote in case you get this far: I recently had to whitelist several friends because they were discussing the minutes of the local school board meeting. The budget numbers triggered the Nigerian scam rules and the sex-ed discussion set off the PORN rules. There's a case where rule intersection analysis might have been helpful -- there's probably not much Nigerian porn priced at millions of dollars. _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk