Duncan Findlay wrote:

> Ummm... I'd be heavily inclined to set these spam scores to 0.01. It's not
> that I don't trust the GA, it's just that if these are the outputs, they
> aren't needed in the first place.

That's not necessarily the case.  They might be needed to reduce false 
positives.  As I posted a couple messages ago, I think I'll do some amount of 
investigation of the tests which are hit by remaining false positives and false 
negatives to see which rules are affecting those messages and get a little more 
information about what's going on.  For the curious, I've attached the frequency 
file that the GA used as part of the process of determining these scores.  The 
format of the file is pretty straightforward.

C
   OVERALL        SPAM     NONSPAM  NAME
    167871       46818      121053  (all messages)
     49729       28038       21691  NO_REAL_NAME
     24250       23073        1177  CLICK_BELOW
     20890       20820          70  CTYPE_JUST_HTML
     20055       17175        2880  FROM_ENDS_IN_NUMS
     18296       14549        3747  PLING
     13463       13231         232  CLICK_HERE_LINK
     13732       12794         938  SUBJ_HAS_SPACES
     11015       10834         181  EXCUSE_3
      9467        9442          25  SUBJ_HAS_UNIQ_ID
      9335        9329           6  A_HREF_TO_REMOVE
     10167        8828        1339  LINE_OF_YELLING
      8483        8290         193  NORMAL_HTTP_TO_IP
      8164        7226         938  MAILTO_LINK
      8016        6664        1352  FORGED_YAHOO_RCVD
      6320        6257          63  INVALID_DATE_TZ_ABSURD
      6652        6193         459  MSG_ID_ADDED_BY_MTA_2
      5602        5315         287  MAILTO_WITH_SUBJ
      4946        4898          48  REMOVE_SUBJ
      4029        4009          20  FAKED_UNDISC_RECIPS
      7465        3884        3581  SUPERLONG_LINE
      3742        3727          15  MAILTO_WITH_SUBJ_REMOVE
      3616        3614           2  REMOVE_PAGE
      3606        3507          99  HTML_WITH_BGCOLOR
      4702        3449        1253  SUBJ_HAS_Q_MARK
      3814        3357         457  FROM_HAS_MIXED_NUMS
      3935        3246         689  SUBJ_ALL_CAPS
      3800        3145         655  MAILTO_TO_REMOVE
     12223        3056        9167  TO_MALFORMED
      4876        3040        1836  MAY_BE_FORGED
      3121        3003         118  MAILTO_TO_SPAM_ADDR
      2997        2970          27  EXCUSE_7
      2987        2953          34  GUARANTEE
      2893        2861          32  REMOVE_IN_QUOTES
      2810        2805           5  A_HREF_TO_UNSUB
      2794        2779          15  TO_EMPTY
      3111        2578         533  DATE_IN_FUTURE
      2712        2569         143  WEB_BUGS
      2623        2564          59  OPT_IN
      2862        2297         565  FORGED_HOTMAIL_RCVD
      2289        2241          48  SUBJ_REMOVE
      2109        2098          11  VERY_SUSP_CC_RECIPS
      2599        2032         567  EXCUSE_16
      4627        2027        2600  DEAR_SOMEBODY
      2001        1996           5  A_HREF_TO_OPT_OUT
      2525        1898         627  COPYRIGHT_CLAIMED
      1891        1878          13  SUSPICIOUS_CC_RECIPS
      1959        1875          84  FROM_NAME_EQ_FROM_ADDR
      2381        1825         556  PLING_PLING
      9913        1818        8095  SUBJ_ENDS_IN_Q_MARK
      2364        1818         546  KNOWN_BAD_DIALUPS
      1915        1806         109  INVALID_DATE_NO_TZ
      3469        1775        1694  FROM_AND_TO_SAME
      1741        1695          46  CALL_FREE
      1743        1656          87  INVALID_MSGID
      2623        1609        1014  EXCUSE_14
      1609        1551          58  BASE64_ENC_TEXT
      1515        1515           0  FAKED_IP_IN_RCVD
      1539        1451          88  JAVASCRIPT
      1479        1428          51  CASINO
      1402        1335          67  SLIGHTLY_UNSAFE_JAVASCRIPT
      1327        1327           0  FORGED_GW05_RCVD
      1511        1284         227  CASHCASHCASH
      1245        1225          20  VIAGRA
      1157        1135          22  EXCUSE_10
      1533        1133         400  SMTPD_IN_RCVD
      1297        1121         176  MISSING_HEADERS
      1045        1023          22  HTTP_ESCAPED_HOST
      1091         984         107  VERY_SUSP_RECIPS
       987         981           6  COPY_DVDS
      1007         956          51  HTTP_USERNAME_USED
       949         948           1  STRONG_BUY
      1049         947         102  OPPORTUNITY
       923         923           0  MORTGAGE_RATES
       897         897           0  REPLY_REMOVE_SUBJECT
       893         893           0  FREE_CONSULTATION
       907         885          22  NUMERIC_HTTP_ADDR
       884         884           0  WE_HATE_SPAM
      3056         878        2178  EXCUSE_6
       875         859          16  BULK_EMAIL
       846         846           0  RESISTANCE_IS_FUTILE
       855         844          11  ONE_HUNDRED_PC_GUAR
       831         828           3  REMOVAL_INSTRUCTIONS
       793         790           3  HTTP_CTRL_CHARS_HOST
       785         778           7  EMAIL_MARKETING
       780         775           5  SECTION_301
       757         754           3  CHECK_OR_MONEY_ORDER
       749         746           3  EXCUSE_4
       738         738           0  MSGID_SPAMSIGN_1
       753         737          16  PRINT_FORM_SIGNATURE
       740         733           7  FORGED_EUDORAMAIL_RCVD
       734         730           4  ONE_HUNDRED_PC_FREE
       868         716         152  SUSPICIOUS_RECIPS
       702         696           6  REALLY_UNSAFE_JAVASCRIPT
       727         687          40  ROUND_THE_WORLD
      1765         686        1079  HTTP_WITH_EMAIL_IN_URL
       693         686           7  THIS_AINT_SPAM
       787         680         107  FOR_FREE
       677         674           3  CBYI
       770         665         105  PORN_4
       762         641         121  PORN_3
       606         606           0  UNSUB_PAGE
       630         604          26  DEAR_FRIEND
       689         602          87  PENIS_ENLARGE2
       617         590          27  FOR_JUST_SOME_AMT
       579         578           1  FORM_W_MAILTO_ACTION
       553         553           0  NONEXISTENT_CHARSET
       523         516           7  EXCUSE_15
      1200         504         696  X_PRIORITY_HIGH
       543         494          49  INVALID_DATE_ODD_MONTH
       495         488           7  TO_BE_REMOVED_REPLY
       487         486           1  BILL_1618
       491         475          16  SOCIAL_SEC_NUMBER
       477         472           5  ADVERT_CODE
       556         465          91  ASCII_FORM_ENTRY
       468         461           7  YOUR_INCOME
       442         442           0  EXCUSE_13
       453         429          24  MSGID_HAS_NO_AT
       442         426          16  AMAZING
       417         409           8  BUGGY_CGI
      1030         408         622  X_MSMAIL_PRIORITY_HIGH
       409         398          11  WANTS_CREDIT_CARD
       399         393           6  CALL_NOW
       404         390          14  GREAT_OFFER
       368         367           1  YOU_HAVE_BEEN_SELECTED
       360         351           9  LIMITED_TIME_ONLY
       348         348           0  TAKE_ACTION_NOW
       344         342           2  SENT_IN_COMPLIANCE
       339         339           0  PRODUCED_AND_SENT_OUT
       327         327           0  PARA_A_2_C_OF_1618
       313         306           7  EXCUSE_12
       305         305           0  MICRO_CAP_WARNING
       302         296           6  ONE_TIME_MAILING
       300         295           5  INCREASE_TRAFFIC
       299         291           8  FOR_INSTANT_ACCESS
       277         272           5  HTTP_NUMBER_WORD
       298         271          27  THE_FOLLOWING_FORM
       258         258           0  CLICK_TO_REMOVE_2
       282         248          34  TO_NO_USER
       247         247           0  REMOVE_SCRIPT
       253         243          10  LOTS_OF_CC_LINES
       242         236           6  MONEY_MAKING
       240         234           6  ASKS_BILLING_ADDRESS
       267         232          35  BE_AMAZED
       225         204          21  DIFFERENT_REPLY_TO
       201         200           1  PREST_NON_ACCREDITED
       195         195           0  FROM_BTAMAIL
       191         191           0  SUBJ_FULL_OF_8BITS
       180         180           0  ADDRESSES_ON_CD
       177         177           0  TRACE_BY_SSN
       172         172           0  AUTO_EMAIL_REMOVAL
       175         168           7  INVALID_DATE
       176         163          13  COMMUNIGATE
       263         154         109  TO_UNSUB_REPLY
       155         149           6  FRIEND_AT_PUBLIC
       147         147           0  WE_HONOR_ALL
       168         145          23  REPLY_TO_EMPTY
       169         140          29  MSG_ID_ADDED_BY_MTA
       164         139          25  PROFITS
       134         134           0  WWW_REMOVEYOU_COM
       121         121           0  MYCASINOBUILDER
       119         119           0  EARN_PER_WEEK
       115         115           0  JODY
       261         112         149  GAPPY_TEXT
       125         111          14  DIRECT_EMAIL
       348         109         239  ALL_CAPS_SUBJECT
       302         106         196  FORGED_RCVD_FOUND
       198         105          93  CHARSET_FARAWAY_HEADERS
        99          99           0  NO_QS_ASKED
       181          98          83  RATWARE
       311          97         214  X_EM_VER_PRESENT
       101          96           5  TONER
       182          93          89  UNDISC_RECIPS
        90          89           1  EXCUSE_1
        87          87           0  SPAM_FORM_RETURN
        88          84           4  BILLION_DOLLARS
        86          83           3  WORK_AT_HOME
        88          83           5  ONCE_IN_LIFETIME
        81          81           0  S_1618
        81          81           0  GREEN_EXCUSE_2
        80          80           0  GREEN_EXCUSE_1
        83          72          11  INCREASE_SALES
        72          72           0  PENNIES_A_DAY
       137          68          69  MIME_NULL_BLOCK
        67          67           0  SPAM_FORM
        67          67           0  GENTLE_FEROCITY
        66          66           0  STOCK_ALERT
        65          65           0  X_PMFLAGS_PRESENT
        64          64           0  FULL_REFUND
        64          64           0  YELLOWSUN
        61          61           0  KIFF
        61          60           1  PORN_7
        59          59           0  FREEWEBHOSTINGCENTRAL
        59          58           1  X_MAILER_GIBBERISH
        56          56           0  POST_IN_RCVD
       155          53         102  AOL_USERS_LINK
        50          50           0  PORN_1
        52          48           4  MASS_EMAIL
        46          46           0  VJESTIKA
        45          45           0  HOME_EMPLOYMENT
        44          44           0  EXCUSE_2
        40          40           0  FREE_PRIORITY_MAIL
        40          40           0  MONSTERHUT
        39          39           0  HR_3113
        38          38           0  FILTERED_BY_WORLDREMOVE
       101          36          65  LARGE_HEX
        45          35          10  ORDER_STATUS
        34          33           1  INVESTOR_SPEC_SHEET
        47          29          18  FROM_MALFORMED
        36          29           7  EXCUSE_17
        27          27           0  IN_ACCORDANCE_WITH_LAWS
        23          23           0  WEB4PORNO_URL
        23          22           1  MDAEMON_2_7_4
        21          21           0  E_WEBHOSTCENTRAL_URL
        21          21           0  FREEWEBCO_NET_URL
        24          21           3  PENIS_ENLARGE
        20          19           1  FROM_MISSING
        19          19           0  YR_MEMBERSHIP_EXCH
        18          18           0  SUBJ_2_CREDIT
        18          18           0  UNSUB_SCRIPT
        18          18           0  INTL_EXEC_GUILD
        34          16          18  FROM_NO_USER
        15          15           0  UCE_MAIL_ACT
        34          14          20  DATE_MISSING
        14          13           1  MANY_FROMS
        14          12           2  SHORT_RECEIVED_LINE
        13          12           1  LONG_NUMERIC_HTTP_ADDR
        16          12           4  SAFEGUARD_NOTICE
        11          11           0  MAIL_IN_ORDER_FORM
        14          11           3  US_DOLLARS
        11          11           0  MURKOWSKI_CRUFT
        11          11           0  POPLAUNCH
        10          10           0  CYBER_FIRE_POWER
        10          10           0  EXCUSE_5
         9           9           0  NEW_DOMAIN_EXTENSIONS
         9           9           0  PURE_PROFIT
        18           9           9  TRACKER_ID
         9           9           0  CORRUPT_MSGID
        10           8           2  UNIVERSITY_DIPLOMAS
        10           8           2  BAD_HELO_WARNING
         8           8           0  STOCK_PICK
         7           6           1  RELAYING_FRAME
         6           6           0  SIRCAM_SIGNATURE
         6           6           0  EXCUSE_8
         9           6           3  GAPPY_SUBJECT
         6           6           0  NIGERIAN_SCAM_2
         6           6           0  25FREEMEGS_URL
         5           5           0  BUGGY_CGI_PT
         5           5           0  CLICKSFORMONEY_NET
         5           5           0  WWW_NETSITESFORFREE_NET
         5           5           0  ANOTHER_NET_AD
         8           5           3  PORN_6
         5           3           2  SHOES_GUY
         3           3           0  HUNZA_DIET_BREAD
         3           3           0  BADTRANS_WORM
        25           3          22  PORN_8
         4           2           2  EXCUSE_11
      8268           2        8266  IN_REP_TO
         2           2           0  UNNEEDED_HTML_ENCODING
         2           2           0  TO_INVESTORS
        31           2          29  YAHOO_MSGID_ADDED
         2           2           0  URGENT_BIZ
         2           2           0  ONLINE_BIZ_OPS
         1           1           0  EU_EMAIL_OPTOUT
         3           1           2  ITS_EFFECTIVE
         1           1           0  WWW_CLIK4YOU_COM
         1           1           0  WWW_AUTOREMOVE_COM
         1           1           0  BUGGY_CGI_DE
         1           1           0  NIGERIAN_SCAM
         1           1           0  WWW_TRAFFICWOW_NET
         0           0           0  X_OSIRU_SPAMWARE_SITE
         0           0           0  FROM_UGETMORE
         0           0           0  USER_IN_WHITELIST
         0           0           0  EMAIL_HARVEST
         1           0           1  BALANCE_FOR_LONG
         0           0           0  FREQ_SPAM_PHRASE
         0           0           0  CHARSET_FARAWAY
         0           0           0  EVITE
         0           0           0  USER_IN_MORE_SPAM_TO
         0           0           0  BUGGY_CGI_DE_2
         0           0           0  WWW_DIRECTFORCEMARKETING_COM
         0           0           0  USER_IN_ALL_SPAM_TO
         0           0           0  RCVD_IN_RSS
         0           0           0  EXCUSE_9
         0           0           0  NO_MX_FOR_FROM
         0           0           0  LASER_PRINTER
         0           0           0  FROM_FORGED_HOTMAIL
         0           0           0  RCVD_IN_VISI
         0           0           0  USER_IN_BLACKLIST
         0           0           0  NO_SELLING
         0           0           0  DIFF_C_PATCH
         0           0           0  JUST_MAILED_PAGE
         0           0           0  BUGGY_CGI_ES_2
         0           0           0  USER_IN_WHITELIST_TO
         0           0           0  RCVD_IN_DUL
         0           0           0  RAZOR_CHECK
         0           0           0  RCVD_IN_OSIRUSOFT_COM
         0           0           0  RCVD_IN_ORBZ
         0           0           0  RCVD_IN_RFCI
         1           0           1  UNIFIED_PATCH
         0           0           0  X_UIDL_SPAMSIGN
        11           0          11  PGP_SIGNATURE
         0           0           0  SPAM_PHRASES_030
         0           0           0  SPAM_PHRASES_100
         0           0           0  RCVD_IN_ORBS
         0           0           0  SEXY_PICS
         0           0           0  CHARSET_FARAWAY_BODY
       122           0         122  EGP_HTML_BANNER
         0           0           0  A_HREF_TO_IP
         0           0           0  X_OSIRU_SPAM_SRC
         0           0           0  EXCUSE_18
         0           0           0  RCVD_IN_RELAYS_ORDB_ORG
         0           0           0  Q_FOR_SELLER
         0           0           0  RCVD_IN_RBL
         0           0           0  SPAM_PHRASES_020
         0           0           0  INTERNET_TERROR_RANT
         0           0           0  BUGGY_CGI_DE_3
         0           0           0  PRINT_OUT_AND_FAX
         1           0           1  CLICK_TO_REMOVE_MAILTO
         0           0           0  EU_200_32_CE
         0           0           0  PORN_2
         0           0           0  MAILMAN_CONFIRM
         0           0           0  BRAND_NEW_PAGER
         0           0           0  BUGGY_CGI_ES
         0           0           0  RCVD_IN_BL_SPAMCOP_NET

Reply via email to