Duncan Findlay wrote: > Ummm... I'd be heavily inclined to set these spam scores to 0.01. It's not > that I don't trust the GA, it's just that if these are the outputs, they > aren't needed in the first place.
That's not necessarily the case. They might be needed to reduce false positives. As I posted a couple messages ago, I think I'll do some amount of investigation of the tests which are hit by remaining false positives and false negatives to see which rules are affecting those messages and get a little more information about what's going on. For the curious, I've attached the frequency file that the GA used as part of the process of determining these scores. The format of the file is pretty straightforward. C
OVERALL SPAM NONSPAM NAME 167871 46818 121053 (all messages) 49729 28038 21691 NO_REAL_NAME 24250 23073 1177 CLICK_BELOW 20890 20820 70 CTYPE_JUST_HTML 20055 17175 2880 FROM_ENDS_IN_NUMS 18296 14549 3747 PLING 13463 13231 232 CLICK_HERE_LINK 13732 12794 938 SUBJ_HAS_SPACES 11015 10834 181 EXCUSE_3 9467 9442 25 SUBJ_HAS_UNIQ_ID 9335 9329 6 A_HREF_TO_REMOVE 10167 8828 1339 LINE_OF_YELLING 8483 8290 193 NORMAL_HTTP_TO_IP 8164 7226 938 MAILTO_LINK 8016 6664 1352 FORGED_YAHOO_RCVD 6320 6257 63 INVALID_DATE_TZ_ABSURD 6652 6193 459 MSG_ID_ADDED_BY_MTA_2 5602 5315 287 MAILTO_WITH_SUBJ 4946 4898 48 REMOVE_SUBJ 4029 4009 20 FAKED_UNDISC_RECIPS 7465 3884 3581 SUPERLONG_LINE 3742 3727 15 MAILTO_WITH_SUBJ_REMOVE 3616 3614 2 REMOVE_PAGE 3606 3507 99 HTML_WITH_BGCOLOR 4702 3449 1253 SUBJ_HAS_Q_MARK 3814 3357 457 FROM_HAS_MIXED_NUMS 3935 3246 689 SUBJ_ALL_CAPS 3800 3145 655 MAILTO_TO_REMOVE 12223 3056 9167 TO_MALFORMED 4876 3040 1836 MAY_BE_FORGED 3121 3003 118 MAILTO_TO_SPAM_ADDR 2997 2970 27 EXCUSE_7 2987 2953 34 GUARANTEE 2893 2861 32 REMOVE_IN_QUOTES 2810 2805 5 A_HREF_TO_UNSUB 2794 2779 15 TO_EMPTY 3111 2578 533 DATE_IN_FUTURE 2712 2569 143 WEB_BUGS 2623 2564 59 OPT_IN 2862 2297 565 FORGED_HOTMAIL_RCVD 2289 2241 48 SUBJ_REMOVE 2109 2098 11 VERY_SUSP_CC_RECIPS 2599 2032 567 EXCUSE_16 4627 2027 2600 DEAR_SOMEBODY 2001 1996 5 A_HREF_TO_OPT_OUT 2525 1898 627 COPYRIGHT_CLAIMED 1891 1878 13 SUSPICIOUS_CC_RECIPS 1959 1875 84 FROM_NAME_EQ_FROM_ADDR 2381 1825 556 PLING_PLING 9913 1818 8095 SUBJ_ENDS_IN_Q_MARK 2364 1818 546 KNOWN_BAD_DIALUPS 1915 1806 109 INVALID_DATE_NO_TZ 3469 1775 1694 FROM_AND_TO_SAME 1741 1695 46 CALL_FREE 1743 1656 87 INVALID_MSGID 2623 1609 1014 EXCUSE_14 1609 1551 58 BASE64_ENC_TEXT 1515 1515 0 FAKED_IP_IN_RCVD 1539 1451 88 JAVASCRIPT 1479 1428 51 CASINO 1402 1335 67 SLIGHTLY_UNSAFE_JAVASCRIPT 1327 1327 0 FORGED_GW05_RCVD 1511 1284 227 CASHCASHCASH 1245 1225 20 VIAGRA 1157 1135 22 EXCUSE_10 1533 1133 400 SMTPD_IN_RCVD 1297 1121 176 MISSING_HEADERS 1045 1023 22 HTTP_ESCAPED_HOST 1091 984 107 VERY_SUSP_RECIPS 987 981 6 COPY_DVDS 1007 956 51 HTTP_USERNAME_USED 949 948 1 STRONG_BUY 1049 947 102 OPPORTUNITY 923 923 0 MORTGAGE_RATES 897 897 0 REPLY_REMOVE_SUBJECT 893 893 0 FREE_CONSULTATION 907 885 22 NUMERIC_HTTP_ADDR 884 884 0 WE_HATE_SPAM 3056 878 2178 EXCUSE_6 875 859 16 BULK_EMAIL 846 846 0 RESISTANCE_IS_FUTILE 855 844 11 ONE_HUNDRED_PC_GUAR 831 828 3 REMOVAL_INSTRUCTIONS 793 790 3 HTTP_CTRL_CHARS_HOST 785 778 7 EMAIL_MARKETING 780 775 5 SECTION_301 757 754 3 CHECK_OR_MONEY_ORDER 749 746 3 EXCUSE_4 738 738 0 MSGID_SPAMSIGN_1 753 737 16 PRINT_FORM_SIGNATURE 740 733 7 FORGED_EUDORAMAIL_RCVD 734 730 4 ONE_HUNDRED_PC_FREE 868 716 152 SUSPICIOUS_RECIPS 702 696 6 REALLY_UNSAFE_JAVASCRIPT 727 687 40 ROUND_THE_WORLD 1765 686 1079 HTTP_WITH_EMAIL_IN_URL 693 686 7 THIS_AINT_SPAM 787 680 107 FOR_FREE 677 674 3 CBYI 770 665 105 PORN_4 762 641 121 PORN_3 606 606 0 UNSUB_PAGE 630 604 26 DEAR_FRIEND 689 602 87 PENIS_ENLARGE2 617 590 27 FOR_JUST_SOME_AMT 579 578 1 FORM_W_MAILTO_ACTION 553 553 0 NONEXISTENT_CHARSET 523 516 7 EXCUSE_15 1200 504 696 X_PRIORITY_HIGH 543 494 49 INVALID_DATE_ODD_MONTH 495 488 7 TO_BE_REMOVED_REPLY 487 486 1 BILL_1618 491 475 16 SOCIAL_SEC_NUMBER 477 472 5 ADVERT_CODE 556 465 91 ASCII_FORM_ENTRY 468 461 7 YOUR_INCOME 442 442 0 EXCUSE_13 453 429 24 MSGID_HAS_NO_AT 442 426 16 AMAZING 417 409 8 BUGGY_CGI 1030 408 622 X_MSMAIL_PRIORITY_HIGH 409 398 11 WANTS_CREDIT_CARD 399 393 6 CALL_NOW 404 390 14 GREAT_OFFER 368 367 1 YOU_HAVE_BEEN_SELECTED 360 351 9 LIMITED_TIME_ONLY 348 348 0 TAKE_ACTION_NOW 344 342 2 SENT_IN_COMPLIANCE 339 339 0 PRODUCED_AND_SENT_OUT 327 327 0 PARA_A_2_C_OF_1618 313 306 7 EXCUSE_12 305 305 0 MICRO_CAP_WARNING 302 296 6 ONE_TIME_MAILING 300 295 5 INCREASE_TRAFFIC 299 291 8 FOR_INSTANT_ACCESS 277 272 5 HTTP_NUMBER_WORD 298 271 27 THE_FOLLOWING_FORM 258 258 0 CLICK_TO_REMOVE_2 282 248 34 TO_NO_USER 247 247 0 REMOVE_SCRIPT 253 243 10 LOTS_OF_CC_LINES 242 236 6 MONEY_MAKING 240 234 6 ASKS_BILLING_ADDRESS 267 232 35 BE_AMAZED 225 204 21 DIFFERENT_REPLY_TO 201 200 1 PREST_NON_ACCREDITED 195 195 0 FROM_BTAMAIL 191 191 0 SUBJ_FULL_OF_8BITS 180 180 0 ADDRESSES_ON_CD 177 177 0 TRACE_BY_SSN 172 172 0 AUTO_EMAIL_REMOVAL 175 168 7 INVALID_DATE 176 163 13 COMMUNIGATE 263 154 109 TO_UNSUB_REPLY 155 149 6 FRIEND_AT_PUBLIC 147 147 0 WE_HONOR_ALL 168 145 23 REPLY_TO_EMPTY 169 140 29 MSG_ID_ADDED_BY_MTA 164 139 25 PROFITS 134 134 0 WWW_REMOVEYOU_COM 121 121 0 MYCASINOBUILDER 119 119 0 EARN_PER_WEEK 115 115 0 JODY 261 112 149 GAPPY_TEXT 125 111 14 DIRECT_EMAIL 348 109 239 ALL_CAPS_SUBJECT 302 106 196 FORGED_RCVD_FOUND 198 105 93 CHARSET_FARAWAY_HEADERS 99 99 0 NO_QS_ASKED 181 98 83 RATWARE 311 97 214 X_EM_VER_PRESENT 101 96 5 TONER 182 93 89 UNDISC_RECIPS 90 89 1 EXCUSE_1 87 87 0 SPAM_FORM_RETURN 88 84 4 BILLION_DOLLARS 86 83 3 WORK_AT_HOME 88 83 5 ONCE_IN_LIFETIME 81 81 0 S_1618 81 81 0 GREEN_EXCUSE_2 80 80 0 GREEN_EXCUSE_1 83 72 11 INCREASE_SALES 72 72 0 PENNIES_A_DAY 137 68 69 MIME_NULL_BLOCK 67 67 0 SPAM_FORM 67 67 0 GENTLE_FEROCITY 66 66 0 STOCK_ALERT 65 65 0 X_PMFLAGS_PRESENT 64 64 0 FULL_REFUND 64 64 0 YELLOWSUN 61 61 0 KIFF 61 60 1 PORN_7 59 59 0 FREEWEBHOSTINGCENTRAL 59 58 1 X_MAILER_GIBBERISH 56 56 0 POST_IN_RCVD 155 53 102 AOL_USERS_LINK 50 50 0 PORN_1 52 48 4 MASS_EMAIL 46 46 0 VJESTIKA 45 45 0 HOME_EMPLOYMENT 44 44 0 EXCUSE_2 40 40 0 FREE_PRIORITY_MAIL 40 40 0 MONSTERHUT 39 39 0 HR_3113 38 38 0 FILTERED_BY_WORLDREMOVE 101 36 65 LARGE_HEX 45 35 10 ORDER_STATUS 34 33 1 INVESTOR_SPEC_SHEET 47 29 18 FROM_MALFORMED 36 29 7 EXCUSE_17 27 27 0 IN_ACCORDANCE_WITH_LAWS 23 23 0 WEB4PORNO_URL 23 22 1 MDAEMON_2_7_4 21 21 0 E_WEBHOSTCENTRAL_URL 21 21 0 FREEWEBCO_NET_URL 24 21 3 PENIS_ENLARGE 20 19 1 FROM_MISSING 19 19 0 YR_MEMBERSHIP_EXCH 18 18 0 SUBJ_2_CREDIT 18 18 0 UNSUB_SCRIPT 18 18 0 INTL_EXEC_GUILD 34 16 18 FROM_NO_USER 15 15 0 UCE_MAIL_ACT 34 14 20 DATE_MISSING 14 13 1 MANY_FROMS 14 12 2 SHORT_RECEIVED_LINE 13 12 1 LONG_NUMERIC_HTTP_ADDR 16 12 4 SAFEGUARD_NOTICE 11 11 0 MAIL_IN_ORDER_FORM 14 11 3 US_DOLLARS 11 11 0 MURKOWSKI_CRUFT 11 11 0 POPLAUNCH 10 10 0 CYBER_FIRE_POWER 10 10 0 EXCUSE_5 9 9 0 NEW_DOMAIN_EXTENSIONS 9 9 0 PURE_PROFIT 18 9 9 TRACKER_ID 9 9 0 CORRUPT_MSGID 10 8 2 UNIVERSITY_DIPLOMAS 10 8 2 BAD_HELO_WARNING 8 8 0 STOCK_PICK 7 6 1 RELAYING_FRAME 6 6 0 SIRCAM_SIGNATURE 6 6 0 EXCUSE_8 9 6 3 GAPPY_SUBJECT 6 6 0 NIGERIAN_SCAM_2 6 6 0 25FREEMEGS_URL 5 5 0 BUGGY_CGI_PT 5 5 0 CLICKSFORMONEY_NET 5 5 0 WWW_NETSITESFORFREE_NET 5 5 0 ANOTHER_NET_AD 8 5 3 PORN_6 5 3 2 SHOES_GUY 3 3 0 HUNZA_DIET_BREAD 3 3 0 BADTRANS_WORM 25 3 22 PORN_8 4 2 2 EXCUSE_11 8268 2 8266 IN_REP_TO 2 2 0 UNNEEDED_HTML_ENCODING 2 2 0 TO_INVESTORS 31 2 29 YAHOO_MSGID_ADDED 2 2 0 URGENT_BIZ 2 2 0 ONLINE_BIZ_OPS 1 1 0 EU_EMAIL_OPTOUT 3 1 2 ITS_EFFECTIVE 1 1 0 WWW_CLIK4YOU_COM 1 1 0 WWW_AUTOREMOVE_COM 1 1 0 BUGGY_CGI_DE 1 1 0 NIGERIAN_SCAM 1 1 0 WWW_TRAFFICWOW_NET 0 0 0 X_OSIRU_SPAMWARE_SITE 0 0 0 FROM_UGETMORE 0 0 0 USER_IN_WHITELIST 0 0 0 EMAIL_HARVEST 1 0 1 BALANCE_FOR_LONG 0 0 0 FREQ_SPAM_PHRASE 0 0 0 CHARSET_FARAWAY 0 0 0 EVITE 0 0 0 USER_IN_MORE_SPAM_TO 0 0 0 BUGGY_CGI_DE_2 0 0 0 WWW_DIRECTFORCEMARKETING_COM 0 0 0 USER_IN_ALL_SPAM_TO 0 0 0 RCVD_IN_RSS 0 0 0 EXCUSE_9 0 0 0 NO_MX_FOR_FROM 0 0 0 LASER_PRINTER 0 0 0 FROM_FORGED_HOTMAIL 0 0 0 RCVD_IN_VISI 0 0 0 USER_IN_BLACKLIST 0 0 0 NO_SELLING 0 0 0 DIFF_C_PATCH 0 0 0 JUST_MAILED_PAGE 0 0 0 BUGGY_CGI_ES_2 0 0 0 USER_IN_WHITELIST_TO 0 0 0 RCVD_IN_DUL 0 0 0 RAZOR_CHECK 0 0 0 RCVD_IN_OSIRUSOFT_COM 0 0 0 RCVD_IN_ORBZ 0 0 0 RCVD_IN_RFCI 1 0 1 UNIFIED_PATCH 0 0 0 X_UIDL_SPAMSIGN 11 0 11 PGP_SIGNATURE 0 0 0 SPAM_PHRASES_030 0 0 0 SPAM_PHRASES_100 0 0 0 RCVD_IN_ORBS 0 0 0 SEXY_PICS 0 0 0 CHARSET_FARAWAY_BODY 122 0 122 EGP_HTML_BANNER 0 0 0 A_HREF_TO_IP 0 0 0 X_OSIRU_SPAM_SRC 0 0 0 EXCUSE_18 0 0 0 RCVD_IN_RELAYS_ORDB_ORG 0 0 0 Q_FOR_SELLER 0 0 0 RCVD_IN_RBL 0 0 0 SPAM_PHRASES_020 0 0 0 INTERNET_TERROR_RANT 0 0 0 BUGGY_CGI_DE_3 0 0 0 PRINT_OUT_AND_FAX 1 0 1 CLICK_TO_REMOVE_MAILTO 0 0 0 EU_200_32_CE 0 0 0 PORN_2 0 0 0 MAILMAN_CONFIRM 0 0 0 BRAND_NEW_PAGER 0 0 0 BUGGY_CGI_ES 0 0 0 RCVD_IN_BL_SPAMCOP_NET