Hash: SHA1

Per the FAQ, when someone has rules that could be added to the
SA distribution,
> Next you should ... post the test rules to [the] list, and ask others
> to check against their corpora. List members may also suggest
> improvements to the tests.

I've begun to make some progress in comparing my personal rules to 2.60,
and am submitting what I consider the most promising for consideration
for inclusion in the distribution rule set.

I've gotten through my "to", "from", and "subject" rules so far. Others
will follow.

Please let me know if you have problems or suggestions for these rules.
If you are able to run these against your own corpus, please do and let
me know what results you get.

(My scores are set for a 9.0 spam threshold -- YMMV.)

After about a week of feedback, I'll submit those that still seem
worthwhile for formal consideration by the development team via bugzilla,
as instructed by the FAQ.

Many thanks.

Bob Menschel

Version: PGP 8.0


header   RM_tz_Insurance   ToCc =~ /Insurance/i
describe RM_tz_Insurance   Addressed to "Insurance" department or person
score    RM_tz_Insurance   1.18   # 29s/1h of 53752 corpus

header   RM_fa_BaliHotels  From =~ /\bbali.{0,30}\.com/i
describe RM_fa_BaliHotels  From a probable spammer
score    RM_fa_BaliHotels  7.00  # 19s/0h of 58857 corpus
# spamming travel agent uses multiple www.bali-hotel.com and related
# domain names in From and in URIs

header   RM_fw_Amazing   From =~ /amazing/i
describe RM_fw_Amazing   Sender name includes the word: Amazing
score    RM_fw_Amazing   1.34    # 34 spam, 0 ham, Sep 12 2003; 12s/0h of 58857 corpus

header   RM_fw_Phrase1   From =~ /\w+_\w+_\w+_\w+_/i
describe RM_fw_Phrase1   Sender name appears to be phrase rather than name
score    RM_fw_Phrase1   1.29    # 29s/0h of 58857 corpus

header   RM_fw_LeadingPrep       From =~ 
describe RM_fw_LeadingPrep       From begins with preposition or similar word, a, all, 
any, free, get
score    RM_fw_LeadingPrep       2.00    # 637 spam, 1 ham, Aug 3 2003; 273s/2h of 
58857 corpus

header   RM_fw_Vword    From =~ /Vword/i   <=== Grunged to get past list filter -- 
replace with original V-word before using.
describe RM_fw_Vword    Sender name contains a known spam word
score    RM_fw_Vword    1.11    # 11 spam, 0 ham, Oct 1 2003; 7s/0h of 58857 corpus

header   RM_fl_ConsWord9 From =~ /\b[bcghjklmnpqrstvwxz]{9,20}\b/
describe RM_fl_ConsWord9 From contains word consisting of consecutive consonants
score    RM_fl_ConsWord9 1.740  # 74 spam, 1 ham, Oct 25 2003; 41s/0h of 58857 corpus

header   RM_ft_Noname    From =~ /"" \</i
describe RM_ft_Noname    Sender has blanked out name
score    RM_ft_Noname    3.00   # 913s/3h of 58857 corpus

header    CMO_RM_sp_AdultMovie    Subject =~ /[EMAIL PROTECTED]|\xA3][\W_]?t 
describe  CMO_RM_sp_AdultMovie    Subject mentions adult movie(s)
score     CMO_RM_sp_AdultMovie    1.670  # 65s/0h of 58857 corpus
# simpler regex which hits the same 65/0: /[EMAIL PROTECTED] m[o0]vie/i

header    CMO_RM_sp_BannedCD      Subject =~ /[EMAIL PROTECTED]/i
describe  CMO_RM_sp_BannedCD      Subject mentions the supposedly banned CD
score     CMO_RM_sp_BannedCD      1.540  # 54s/0h of 58857 corpus
# All 54 of this spam is caught by the much simpler /b\s?a\s?n\s?n\s?e\s?d\s?c\s?d/i

header    RM_sp_CopyDVD           Subject =~ /(?:c[o0]py dvd|dvd.{1,15}c[o0]py|dvd 
describe  RM_sp_CopyDVD           Subject mentions copying DVDs
score     RM_sp_CopyDVD           2.340  # 134s/0h of 58857 corpus

header    RM_sp_FindYour          Subject =~ /find your/i
describe  RM_sp_FindYour          Subject suggests you find something
score     RM_sp_FindYour          1.160  # 16s/0h of 58857 corpus

header    RM_sp_FreePPV           Subject =~ /free [EMAIL PROTECTED] -]?per[ -]?view/i
describe  RM_sp_FreePPV           Subject mentions free pay-per-view 
score     RM_sp_FreePPV           2.430  # 143s/0h of 58857 corpus

header    CMO_RM_sp_GiftCard      Subject =~ 
/[g6][\W_]?[il1:\|\*\xCC-\xCF\xEC-\xEF][\W_]?f[\W_]?t [EMAIL PROTECTED]/i
describe  CMO_RM_sp_GiftCard      Subject mentions a gift card
score     CMO_RM_sp_GiftCard      1.690  # 69 spam, 0 ham, Aug 9 2003; 20s/0h of 58857 

header    RM_sp_LookingFor        Subject =~ /(?:(?:We are|we're) looking for|looking 
describe  RM_sp_LookingFor        Subject mentions looking for something
score     RM_sp_LookingFor        1.380  # 38 spam, 0 ham, Aug 6 2003; 14s/0h of 58857 

header    RM_sp_RuinAnyone        Subject =~ /ruin anyone anywhere/i
describe  RM_sp_RuinAnyone        Subject suggests you can ruin anyone anywhere
score     RM_sp_RuinAnyone        1.240  # 24 spam, 0 ham, Aug 31 2003; 23s/0h of 
58857 corpus

header    RM_sp_TONER             Subject =~ /\b(?:printer[-\s]*)?(?:[EMAIL 
PROTECTED])?(?:t[o0]ner|ink(?:[-\s]*jet)?|[EMAIL PROTECTED]|copier)[-\s]+(?:[EMAIL 
describe  RM_sp_TONER             Subject contains Toner or Ink Cartridge
score     RM_sp_TONER             1.710  # 72 spam, 0 ham, Aug 20 2003; 61s/0h of 
58857 corpus
# Many people have contributed to this regex over the months...

header    RE_RM_sp_TooHigh        Subject =~ /too high/i 
describe  RE_RM_sp_TooHigh        Too high in subject
score     RE_RM_sp_TooHigh        1.580  # 58s/0h of 58857 corpus

header    RM_sp_WillAstonish      Subject =~ /Will Astonish You/i
describe  RM_sp_WillAstonish      Subject says this will astonish you.
score     RM_sp_WillAstonish      1.170  # 17 spam, 0 ham, Aug 19 2003; 6s/0h of 58857 

header    CMO_RM_spd_GetPaid      Subject =~ 
/[g6][\W_]?[e3\*\xC8-\xCB\xE8-\xEB][\W_]?t [EMAIL 
describe  CMO_RM_spd_GetPaid      Subject mentions getting paid for something
score     CMO_RM_spd_GetPaid      1.640  # 64s/0h of 58857 corpus

header    RM_spd_Money            Subject =~ /(?:(?:save|make)[ -].{0,15}money[ 
-](?:in|on)|(?:free|grant|saving|with our|(?:claim|keep) your) money|money machine)/i
describe  RM_spd_Money            Subject mentions money in phrase that implies spam
score     RM_spd_Money            1.380  # 76s/1h of 58857 corpus

header    RM_spd_StockMarket      Subject =~ /STOCK MARKET/i
describe  RM_spd_StockMarket      Subject mentions a/the STOCK MARKET 
score     RM_spd_StockMarket      3.000  # 210s/0h of 58857 corpus

header    RM_spd_WorthCash        Subject =~ 
describe  RM_spd_WorthCash        Subject mentions something is worth cash
score     RM_spd_WorthCash        3.000  # 209s/0h of 58857 corpus

header    CMO_RM_spe_BiggerMember Subject =~ 
describe  CMO_RM_spe_BiggerMember Subject mentions bigger body part         
score     CMO_RM_spe_BiggerMember 1.140  # 14s/0h of 58857 corpus
# Unlike the Banned CD rule, where my simpler regex caught all the spam that Chris' 
CMO rule catches,
# this CMO rule catches twice as much spam as my original rule. 

header    RE_RM_spm_ImproveYour   Subject =~ /improve your/i
describe  RE_RM_spm_ImproveYour   Subject suggests you improve something 
score     RE_RM_spm_ImproveYour   1.570  # 57s/0h of 58857 corpus

header    CMO_RM_sw_boost         Subject =~ 
describe  CMO_RM_sw_boost         boost in Subject
score     CMO_RM_sw_boost         2.980  # 198s/1h of 58857 corpus

header    CMO_RM_sw_Forever       Subject =~ 
describe  CMO_RM_sw_Forever       Forever in Subject
score     CMO_RM_sw_Forever       1.760   # 76s/0h of 58857 corpus

header    RM_sw_ForWomen          Subject =~ /Women:/i
describe  RM_sw_ForWomen          Subject appears to be for women as a class, 
therefore possible spam
score     RM_sw_ForWomen          1.33   # 33 spam, 0 ham, Aug 19 2003; 16s/0h of 
58857 corpus

header    RM_sw_MBA               Subject =~ /\bMBA\b/i
describe  RM_sw_MBA               Subject mentions an MBA
score     RM_sw_MBA               1.15   # 17s/0h of 58857 corpus
# Interestingly, this MBA rule, when run through the obfuscation system,
# matches no additional spam in my corpus, but does match two ham.

header    RM_sw_Partnership       Subject =~ /Partnership/i
describe  RM_sw_Partnership       Subject mentions Partnership
score     RM_sw_Partnership       1.20   # 23s/0h of 58857 corpus

header    CMO_RM_sw_Proven        Subject =~ 
describe  CMO_RM_sw_Proven        Proven in Subject
score     CMO_RM_sw_Proven        1.910  # 91s/0h of 58857 corpus

header    RM_sw_SpecialBang       Subject =~ /Special\!/i
describe  RM_sw_SpecialBang       Subject mentions a special!
score     RM_sw_SpecialBang       1.185  # 37s/1h of 58857 corpus

header    CMO_RM_sw_Timeshare     Subject =~ 
/t[\W_]?[il1:\|\*\xCC-\xCF\xEC-\xEF][\W_]?(?:\/\\\/\\|\/V\\|rn|[m])[EMAIL PROTECTED]/i
describe  CMO_RM_sw_Timeshare     Subject mentions timeshare(s)
score     CMO_RM_sw_Timeshare     1.200  # 20s/0h of 58857 corpus

header    CMO_RM_swd_debt         Subject =~ 
describe  CMO_RM_swd_debt         Subject mentions debt
score     CMO_RM_swd_debt         3.000  # 541s/0h of 58857 corpus

header    RM_swd_Foreclosure      Subject =~ /Foreclosure/i
describe  RM_swd_Foreclosure      Subject mentions foreclosure(s)
score     RM_swd_Foreclosure      1.19   # 20s/0h of 58857 corpus

header    RM_swd_investors         Subject =~ /investors/i
describe  RM_swd_investors         Subject mentions investors
score     RM_swd_investors         2.116  # 116s/0h of 58857 corpus

header    RM_swd_Paying            Subject =~ /Paying/i
describe  RM_swd_Paying            Subject mentions Paying for something
score     RM_swd_Paying            1.292  # 146s/4h of 58857 corpus

header    CMO_RM_swd_MoneyBang     Subject =~ 
describe  CMO_RM_swd_MoneyBang     Subject mentions money with exclamation mark
score     CMO_RM_swd_MoneyBang     1.200  # 20s/0h of 58857 corpus

header    CMO_RM_swm_DrugsV       Subject =~ /[vu][\W_]?[il1:\|[EMAIL 
describe  CMO_RM_swm_DrugsV       Subject mentions known spam subject
score     CMO_RM_swm_DrugsV       20.00  # 1977s/0h of 58857 corpus

header    CMO_RM_swm_Medication   Subject =~ 
describe  CMO_RM_swm_Medication   Subject mentions medication
score     CMO_RM_swm_Medication   3.000   # 597s/1h of 58857 corpus

header    RE_RM_swm_Meds          Subject =~ /m(e|3)ds/i  
describe  RE_RM_swm_Meds          Meds in subject
score     RE_RM_swm_Meds          3.00   # 665s/0h of 58857 corpus
# This rule also hits some ham when extensively obfuscated, without greatly improving 
the spam hits.

header    CMO_RM_swm_Younger       Subject =~ 
describe  CMO_RM_swm_Younger       Younger in Subject
score     CMO_RM_swm_Younger       1.41   # 41s/0h of 58857 corpus

header    RM_swp_porn1             Subject =~ /\bporn/i
describe  RM_swp_porn1             Subject seems to be about porn
score     RM_swp_porn1             1.460  # 46s/0h of 58857 corpus

header    CMO_RM_swp_porn1         Subject =~ 
describe  CMO_RM_swp_porn1         Subject seems to be about porn
score     CMO_RM_swp_porn1         1.280  # 28s/0h of 58857 corpus

header    RM_swp_porn2             Subject =~ /\bfuck/i
describe  RM_swp_porn2             Subject seems to be about porn
score     RM_swp_porn2             0.900  # 18s/1h of 58857 corpus

header    CMO_RM_swp_porn2         Subject =~ 
describe  CMO_RM_swp_porn2         Subject seems to be about porn
score     CMO_RM_swp_porn2         0.800  # 8s/0h of 58857 corpus

header    RM_swt_ConsWord6        Subject =~ /\b[bcghjklmnpqrstvwxz]{6,20}\b/
describe  RM_swt_ConsWord6        subject contains word consisting of consecutive 
score     RM_swt_ConsWord6        3.000  # 550s/0h of 58857 corpus

header    CMO_RM_swt_Masked02      Subject =~ 
describe  CMO_RM_swt_Masked02      masked spam word(s) in subject
score     CMO_RM_swt_Masked02      4.000  # 20s/0h of 58857 corpus

header    CMO_RM_swt_Masked05      Subject =~ 
describe  CMO_RM_swt_Masked05      masked spam word(s) in subject
score     CMO_RM_swt_Masked05      2.000  # 400s/3h of 58857 corpus -- ham: typo(2), 

header    CMO_RM_swt_Masked06      Subject =~ /(?!\bcheap(er)?)[EMAIL 
describe  CMO_RM_swt_Masked06      masked spam word(s) in subject
score     CMO_RM_swt_Masked06      4.000  # 86s/0h of 58857 corpus

header    CMO_RM_swt_Masked07      Subject =~ 
describe  CMO_RM_swt_Masked07      masked spam word(s) in subject
score     CMO_RM_swt_Masked07      4.000  # 53s/0h of 58857 corpus

header    CMO_RM_swt_Masked14      Subject =~ 
describe  CMO_RM_swt_Masked14      masked spam word(s) in subject
score     CMO_RM_swt_Masked14      4.000  # 27s/0h of 58857 corpus

header    RM_swt_Masked19          Subject =~ /\bpenis\b/i
describe  RM_swt_Masked19          masked spam word(s) in subject
score     RM_swt_Masked19          3.000  # 344s/0h of 58857 corpus

header    CMO_RM_swt_Masked19      Subject =~ 
describe  CMO_RM_swt_Masked19      masked spam word(s) in subject
score     CMO_RM_swt_Masked19      4.000  # 111s/0h of 58857 corpus

header    RM_sl_LettersNums       Subject =~ 
describe  RM_sl_LettersNums       Subject contains multiple mixed letters and numbers 
in one "word"
score     RM_sl_LettersNums       2.930  # 135 spam, 2 ham (Dell), Sep 12 2003; 
193s/0h of 58857 corpus

header    RM_sl_RandomLetters2a   Subject =~ /\b[cjnqstuvwxz][bgjqu]\b/i
describe  RM_sl_RandomLetters2a   Subject contains random-text spamsign
score     RM_sl_RandomLetters2a   2.163  # 465s/3h of 58857 corpus

header    RM_sl_RandomLetters2b   Subject =~ /\be[bfjkopqv]\b/i
describe  RM_sl_RandomLetters2b   Subject contains random-text spamsign
score     RM_sl_RandomLetters2b   1.71   # 71s/0h of 58857 corpus

header    RM_sl_RandomLetters3a   Subject =~ /\b[abehikmpqrsvwxyz]a[bjkquvz]\b/i
describe  RM_sl_RandomLetters3a   Subject contains random-text spamsign
score     RM_sl_RandomLetters3a   1.375  # 75s/1h of 58857 corpus; ham: special 

header    RM_sl_RandomLetters3b   Subject =~ /\bx[bfghjklnpqrstwz][bfghjklnpqrstwz]\b/i
describe  RM_sl_RandomLetters3b   Subject contains random-text spamsign
score     RM_sl_RandomLetters3b   1.580  # 58s/0h of 58857 corpus

header    RM_sl_RandomLetters3c   Subject =~ / [fghjklnqrtz]{3} /i
describe  RM_sl_RandomLetters3c   Subject contains random-text spamsign
score     RM_sl_RandomLetters3c   3.000  # 212s/0h of 58857 corpus
                                         # avoid 

header    RM_sl_RandomLetters4a   Subject =~ /\b[eiou][bfghjklnpqrtwz]{3}\b/i
describe  RM_sl_RandomLetters4a   Subject contains random-text spamsign
score     RM_sl_RandomLetters4a   2.140  # 114s/0h of 58857 corpus
                                         # lots of ham with leading A

header    RM_sl_RandomLetters5a   Subject =~ /\b[bcdfghjklnpqrvwz]{5}\b/i
describe  RM_sl_RandomLetters5a   Subject contains random-text spamsign
score     RM_sl_RandomLetters5a   3.000  # 245s/0h of 58857 corpus

header    RM_sl_RandomCons6a      Subject =~ /\b[bcdghjklmnpqrstvwxz]{6}\b/i
describe  RM_sl_RandomCons6a      Subject contains random-text spamsign
score     RM_sl_RandomCons6a      3.000  # 325s/0h of 58857 corpus

header    RM_sl_RandomCons7a      Subject =~ /\b[bcdfghjklmnpqrstvwxz]{7}\b/i
describe  RM_sl_RandomCons7a      Subject contains random-text spamsign
score     RM_sl_RandomCons7a      2.097  # 329s/2h of 58857 corpus
                                         # ham: JDBGMGR hoax and response

header    RM_st_LongSubject       Subject =~ /.{170,}/
describe  RM_st_LongSubject       Subject is excessively long -- more than 169 chars
score     RM_st_LongSubject       9.100  # 139s/0h of 58857 corpus

header    RM_st_RandomText        Subject =~ /\%RANDOM_TEXT|\%RANDOM_WORD/i
describe  RM_st_RandomText        Subject contains random-text spamsign
score     RM_st_RandomText        9.1    # 8 spam, 0 ham, Sep 5 2003; 3s/0h of 58857 

header    RM_st_USAscii           Subject:raw =~ /us-ascii/i
describe  RM_st_USAscii           Subject specifies display in US-ascii, unnecessary 
unless spam hides subject
score     RM_st_USAscii           0.900  # 27s/2h of 58857 corpus, ham = MS 

This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
Spamassassin-talk mailing list

Reply via email to