I got 179 Nigerian scam message bodies (though not headers) from http://www.quatloos.com/cm-niger/cm-niger.htm, and used them to test out how SA handles them. Testing with the default 2.2 setup (taken from CVS today), 72 out of 179 is correctly tagged as spam. Overriding some of the weird negative scores the GA made, especially DEAR_SOMEBODY, filtered out 19 of the remaining 108 messages. Then I made some modifications and additions that filtering out 74 of the remaining 89.
First I made some mods to the 20_body_tests.cf that deal with Nigerian scams. Here's a diff: <<<<< Index: 20_body_tests.cf =================================================================== RCS file: /cvsroot/spamassassin/spamassassin/rules/20_body_tests.cf,v retrieving revision 1.50 diff -u -3 -p -r1.50 20_body_tests.cf --- 20_body_tests.cf 24 Feb 2002 11:03:23 -0000 1.50 +++ 20_body_tests.cf 1 Mar 2002 06:03:54 -0000 @@ -522,10 +522,10 @@ body NIGERIAN_SCAM /BASED ON INFORMATIO describe NIGERIAN_SCAM Nigerian scam, cf http://www.snopes2.com/inboxer/scams/nigeria.htm # (contrib: skod) -body NIGERIAN_SCAM_2 /(?:Government of Nigeria|NIGERIAN? NATIONAL|Nigerian? Government)/ +body NIGERIAN_SCAM_2 /(?:Government of Nigeria|NIGERIAN? NATIONAL|Nigerian? Government|Federal Republic of Nigeria)/ describe NIGERIAN_SCAM_2 Mutated Nigerian scams -body US_DOLLARS /Million\b.{0,40}\b(?:United States Dollars|USD)/i +body US_DOLLARS /Million\b.{0,40}\b(?:United States Dollars|USD|U\. ?S\. Dollars)/i describe US_DOLLARS Nigerian scam key phrase rawbody UNNEEDED_HTML_ENCODING /font=3E/i >>>>>>>>> I added "Federal Republic of Nigeria" to NIGERIAN_SCAM_2 and "U\. ?S\. Dollars" (matches "U.S. Dollars" or "U. S. Dollars") to US_DOLLARS. Then I created these new rules: <<<<<<<< body NIGERIAN_SCAM_3 /(?:Bank of Nigeria|Nigerian? National Petroleum)/i describe NIGERIAN_SCAM_3 Nigerian Bank or Petroleum scam body NIGERIAN_SCAM_4 /\b(?:closed?|freeze|frozen?)\b.{0,10}bank account/i describe NIGERIAN_SCAM_4 Some poor Nigerian got his bank account frozen body NIGERIAN_SCAM_5 /\b(wife|widow|son|husband)\b.{0,60}Sann?i Abacha/ describe NIGERIAN_SCAM_5 Nigerian widow needs your help... body NIGERIAN_SCAM_6 /Sann?i Abacha.{0,60}\b(wife|widow|son|husband)\b/ describe NIGERIAN_SCAM_6 Nigerian widow needs your help... body US_DOLLARS_2 /(?:\$|usd)\d{2,3}(?:\.\d)?m\b/i describe US_DOLLARS_2 Nigerian scam key phrase ($NN.Nm) body US_DOLLARS_3 /(?:\$|usd ?)\d{1,3},\d{3},\d{3}(?:\.\d\d)?/i describe US_DOLLARS_3 Nigerian scam key phrase ($NN,NNN,NNN.NN) score NIGERIAN_SCAM_3 7.050 score NIGERIAN_SCAM_4 7.050 score NIGERIAN_SCAM_5 7.050 score NIGERIAN_SCAM_6 7.050 score US_DOLLARS_2 3.339 score US_DOLLARS_3 3.339 >>>>>>>>>>>>> Anything matching "Bank of Nigeria" or "Nigerian National Petroleum" is one of the scams. Also anything taking about closed or frozen bank accounts is probably a scam, though I suppose those are more likely to show up in legitimate mail. The next two rules look for people claiming to be a relative of the late General Sani Abacha. Finally there's two rules to look for descriptions of millions of dollars in the form of "$12.3m" or "$1,234,567.89". If anyone wants the entire archive of bodies I have, or just the ones that are missed, please mail me. -- http://dmoz.org | Give a man a match, and he'll be warm for a | minute, but light him on fire, and he'll be The world's largest human edited | warm for the rest of his life. edited web directory directory | ICQ: 132152059 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk