Last month I offered some header rules for possible inclusion in a future
distribution. Those that passed muster have been formally submitted via
bugzilla.

I've now completed review of my "body phrase" rule set, and feel they're
ready for similar review.

Please look over and test the following rules, and let me know if they
work for you.

Use your own scoring -- my scores tend to be high, since I use a 9.0 spam
threshold. With a 5.0 threshold you may want to cut my scores in half.

Note to those who wonder: this is only a small extract of the rules I
use, and have posted to http://www.exit0.us/index.php/RM_PhraseRules --
per discussion with the developers, I'm submitting here only those rules
that hit at least 1% of all emails, and where at least 97% of the hits
are spam.

# !! RM_BPC -- Commerce and Marketing-related Spam Phrasing

body      RE_bpc_BestOnline      /best online/i
describe  RE_bpc_BestOnline      Found Phrase 'best online'
score     RE_bpc_BestOnline      1.990  # 99s/0h of 74869 corpus
                                        # "best online mortage" would also hit 
MORTGAGE_BEST

body      RM_bpc_LowCost         /low cost/i
describe  RM_bpc_LowCost         mentions low cost
score     RM_bpc_LowCost         1.650  # 520s/7h of 74869 corpus
                                        # ham: autoweb.com May 2001, resume, valid 
emails(2), valid email newsletter (3)

body      RM_bpc_OrderToday      /order today/i
describe  RM_bpc_OrderToday      says you should order today
score     RM_bpc_OrderToday      1.832  # 749s/8h of 74869 corpus
                                        # ham: valid marketing newsletters

body      RE_bpc_WithYourPurch   /with your purchase/i
describe  RE_bpc_WithYourPurch   Contains phrase With Your Purchase
score     RE_bpc_WithYourPurch   1.595  # 119s/1h of 74869 corpus; ham: drugstore.com

# !! RM_BPE -- Education-related Spam Phrasing

body      RM_bpe_HighSchlDip     /high school diploma/i
describe  RM_bpe_HighSchlDip     mentions a high school diploma
score     RM_bpe_HighSchlDip     1.347  # 104s/2h of 74869 corpus; ham: resumes

# !! RM_BPF -- Finance and Money-related Spam Phrasing

body      RM_bpf_DebtElim        /D.?e.?b.?t\WE.?l.?i.?m.?i.?n.?a.?t.?i.?o.?n/i
describe  RM_bpf_DebtElim        Debt Elimination
score     RM_bpf_DebtElim        3.000  # 382s/0h of 74869 corpus; may also match 
distrib CONSOLIDATE_DEBT

body      RE_bpf_DebtGetOutOf    /out of debt/i    
describe  RE_bpf_DebtGetOutOf    Get out of debt!
score     RE_bpf_DebtGetOutOf    1.770  # 77s/0h of 74869 corpus

body      RM_bpf_FwdLkngStmts    /forward[ -]looking statements/i 
describe  RM_bpf_FwdLkngStmts    Contains phrasing used by stock market spammers
score     RM_bpf_FwdLkngStmts    6.000  # 424s/0h of 74869 corpus

body      RM_bpf_LotsOfLenders   /hundreds of [LI]enders/i 
describe  RM_bpf_LotsOfLenders   mentions lots of (mortgage) lenders
score     RM_bpf_LotsOfLenders   3.000  # 272s/0h of 39989 corpus; 161s/0h of 74869 
corpus

body      RM_bpf_MillBucks2      /Million (?:USD|United States Dollars)/i
describe  RM_bpf_MillBucks2      mentions several million dollars
score     RM_bpf_MillBucks2      3.000  # 204s/0h of 74869 corpus

body      RM_bpf_MillBucks3      /Million.{1,30}Dollars/i
describe  RM_bpf_MillBucks3      mentions several million dollars
score     RM_bpf_MillBucks3      1.618  # 927s/14h of 74869 corpus
                                        # ham: valid emails & news reports

body      RM_bpf_NoTurnDown      /no one is turned down/i
describe  RM_bpf_NoTurnDown      Mortgage, Loan, or Insurance qualification
score     RM_bpf_NoTurnDown      3.000  # 334s/0h of 74869 corpus


# !! RM_BPI -- Insurance, Warranty, and similar Spam Phrasing

body      RM_bpi_LifeInsur       /\blife\W{0,3}ins/i
describe  RM_bpi_LifeInsur       mentions life insurance
score     RM_bpi_LifeInsur       2.013  # 304s/2h of 74869 corpus; ham: valid emails

body      RM_bpi_LowestRates     /lowest rates/i 
describe  RM_bpi_LowestRates     Contains spammer phrasing - insurance or mortgage
score     RM_bpi_LowestRates     3.000  # 654s/1h of 74869 corpus; ham: Marriott 
Rewards

body      RM_bpi_LowestRatesa    /lowest rates available/i
describe  RM_bpi_LowestRatesa    Contains spammer phrasing - insurance or mortgage
score     RM_bpi_LowestRatesa    1.110  # 110s/0h of 74869 corpus; add to LowestRates

body      RM_bpi_NoObligQuotei   /n.?o.?o.?b.?l.?i.?g.?a.?t.?i.?o.?n.?q.?u.?o.?t.?e/i
describe  RM_bpi_NoObligQuotei   Offers a no-obligation quote
score     RM_bpi_NoObligQuotei   3.000  # 216s/0h of 74869 corpus

body      RM_bpi_ProtectFam      /Protect your family/i
describe  RM_bpi_ProtectFam      Spammer phrasing -- insurance
score     RM_bpi_ProtectFam      2.050  # 105s/0h of 74869 corpus

# !! RM_BPM -- Medical or Biological Spam Phrasing

body      RM_bpm_FreeMedConsult  /Free medical consultation/i
describe  RM_bpm_FreeMedConsult  offers a free medical consultation.
score     RM_bpm_FreeMedConsult  1.900  # 90s/0h of 74869 corpus

body      RM_bpm_MagicLubricant  /"Magic Lubricant"/i
describe  RM_bpm_MagicLubricant  Spammer phrasing in body of email
score     RM_bpm_MagicLubricant  7.000  # 198s/0h of 74869 corpus

body      RM_bpm_MoreEnergy      /More energy/i
describe  RM_bpm_MoreEnergy      talks about having or generating more energy
score     RM_bpm_MoreEnergy      2.610  # 161s/0h of 74869 corpus

body      RM_bpm_MultipleOrgasms /multiple orgasms/i 
describe  RM_bpm_MultipleOrgasms Spammer phrasing in body of email
score     RM_bpm_MultipleOrgasms 3.000  # 224s/0h of 74869 corpus

body      RM_bpm_NoEmbarrassing  /no embarrassing/i    # From Emporium
describe  RM_bpm_NoEmbarrassing  Wow, I won't be embarrassed anymore!
score     RM_bpm_NoEmbarrassing  3.000  # 227s/0h of 68055 corpus; 215s/0h of 74869 
corpus

body      RM_bpm_PowerBottle     /"Power Bottle"/i
describe  RM_bpm_PowerBottle     Spammer phrasing in body of email
score     RM_bpm_PowerBottle     7.000  # 198s/0h of 74869 corpus

body      RM_bpm_PrescrMeds      /Prescription Medications/i
describe  RM_bpm_PrescrMeds      seems to discuss prescription medications
score     RM_bpm_PrescrMeds      3.000  # FP: 1293s/2h of 74869 corpus; ham: email to 
employer health insur, drugstore.com

body      RM_bpm_SideEffects     /side effects/i    # From Emporium
describe  RM_bpm_SideEffects     Has Side Effects
score     RM_bpm_SideEffects     1.984  # 984s/9h of 74869 corpus; ham: valid emails, 
drugstore.com, howstuffworks.com

body      RM_bpm_USDoctors       /[Uü].?S.? 
(?:Licensed)?.?(?:Doctors?|Physicians?|Pharmac(?:y|ies))/i
describe  RM_bpm_USDoctors       mentions U.S. doctor(s) or pharmacy(s)
score     RM_bpm_USDoctors       3.00   # 2824s/3h of 74869 corpus; ham: valid emails 
and newsletters

# !! RM_BPN -- Nigerian (and other) Scam-related Spam Phrasing

body      RM_bpn_AsAForeigner    /\b(?:who was a|as a|an? honest|you being a|to any) 
foreigner/i
describe  RM_bpn_AsAForeigner    contains apparent spammer reference to a foreigner
score     RM_bpn_AsAForeigner    1.790  # 79s/0h of 74869 corpus

body      RM_bpn_Confidential    
/(?:total(?:ly)?|VERY|strictly|high(?:est|ly)?|utmost) CONFIDEN(?:ce|T(?:AI|IA)L)/i
describe  RM_bpn_Confidential    says this is very confidential
score     RM_bpn_Confidential    1.616  # 431s/6h of 74869 corpus; ham: membership 
list, survey confidentiality,

body      RM_bpn_ForeignAcct     /foreign (?:offshore )?(?:bank|account)/i
describe  RM_bpn_ForeignAcct     mentions a foreign account
score     RM_bpn_ForeignAcct     2.800  # 180s/0h of 74869 corpus

body      RM_bpn_FreeCableTV     /Free Cable.{0,4}TV/i
describe  RM_bpn_FreeCableTV     Spammer phrasing or subject found in body of email
score     RM_bpn_FreeCableTV     3.000  # 360s/0h of 74869 corpus

body      RM_bpn_PercentageSpam  
/(?:(?:negotiate|reasonable|acc?or?ding|certain|agg?ree).{1,10}percentage|percentage.{1,10}(?:indicat|previous|involved)|your
 percentage will)/i
describe  RM_bpn_PercentageSpam  mentions percentage(s) in a spam-like way
score     RM_bpn_PercentageSpam  1.257  # 145 spam, 0 ham, Sep 1 2003; 94s/0h of 39989 
corpus; 84s/1h of 63143 corpus; 77s/2h of 74869 corpus

body      RM_bpn_SecurityComp    /security (?:company|storage house)/i
describe  RM_bpn_SecurityComp    mentions a security company 
score     RM_bpn_SecurityComp    1.850  # 255s/2h of 74869 corpus; ham: valid emails

body      RM_bpn_TotalSum        /The total sum/i
describe  RM_bpn_TotalSum        mentions some total sum
score     RM_bpn_TotalSum        2.520  # 152s/0h of 74869 corpus

body      RM_bpn_UrgentReply     /(?:urgent reply|reply urgent)/i
describe  RM_bpn_UrgentReply     requests an "urgent" reply
score     RM_bpn_UrgentReply     2.210  # 121s/0h of 74869 corpus

# !! RM_BPP -- Porn-related and Adult-related Spam Phrasing

body      RM_bpp_AdultMovie      /[EMAIL PROTECTED] ?m[o0]vie/i
describe  RM_bpp_AdultMovie      mentions adult movie(s)
score     RM_bpp_AdultMovie      6.000  # 1841s/0h of 74869 corpus

# !! RM_BPQ -- Privacy, Identity Theft, Copyright, Online Security-related Spam 
Phrasing

body      RM_bpq_BannedCD        /b\s?a\s?n\s?n\s?e\s?d\s?c\s?d/i
describe  RM_bpq_BannedCD        mentions the supposedly banned CD
score     RM_bpq_BannedCD        3.000  # 910s/0h of 74869 corpus

body      RM_bpq_CopyDVD         /c[o0]py dvd/i
describe  RM_bpq_CopyDVD         seems to mention copying DVDs
score     RM_bpq_CopyDVD         3.000  # 216s/0h of 74869 corpus

body      RM_bpq_DVDPro          /\bDVD\W{0,3}pro\b/i
describe  RM_bpq_DVDPro          mentions DVD Pro -- DVD copying software
score     RM_bpq_DVDPro          1.760  # 69 spam, 0 ham, Aug 9 2003; 120s/0h of 39989 
corpus; 98s/0h of 63143 corpus; 76s/0h of 74869 corpus

body      RM_bpq_SpyOnAnyone     /Spy on Anyone/i
describe  RM_bpq_SpyOnAnyone     suggests you can spy on anyone using spam product
score     RM_bpq_SpyOnAnyone     2.020  # 102s/0h of 74869 corpus

body      RM_bpq_SpySoftware     /Spy Software/
describe  RM_bpq_SpySoftware     mentions known spam product
score     RM_bpq_SpySoftware     1.990  # 99s/0h of 74869 corpus

# !! RM_BPS -- Spam-related Spam Phrasing

body      RM_bps_RemoveMail       /Remove mail/i
describe  RM_bps_RemoveMail       seems to offer a "Remove mail" link
score     RM_bps_RemoveMail       1.750  # 79s/0h of 63143 corpus; 75s/0h of 74869 
corpus

body      RM_bps_SpamRemedy       /Spam.?Remedy/
describe  RM_bps_SpamRemedy       mentions known spam product
score     RM_bps_SpamRemedy       9.000  # 194s/0h of 74869 corpus

body      RM_bps_WeHonor          /we honor/i          # From Emporium
describe  RM_bps_WeHonor          It says they honor.
score     RM_bps_WeHonor          3.000  # 967s/3h of 74869 corpus

# !! RM_BP -- Phrases used within body of spam (Miscellaneous)

body      RM_bp_CheckOnYour      /Check up on your/i
describe  RM_bp_CheckOnYour      suggests you Check up on your something
score     RM_bp_CheckOnYour      1.960  # 96s/0h of 74869 corpus

body      RM_bp_RiskFree         /\%?100\%? risk free/i
describe  RM_bp_RiskFree         suggests that something is 100% risk free
score     RM_bp_RiskFree         2.250  # 125s/0h of 74869 corpus

body      RM_bp_SelfEsteem       /self esteem/i     # From Emporium
describe  RM_bp_SelfEsteem       Talks about self esteem
score     RM_bp_SelfEsteem       1.800  # 280s/2h of 39989 corpus; 269s/2h of 63143 
corpus; 240s/2h of 74869 corpus






-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to