Hi Jeremy, interesting article. I think you're wrong that this is the last stand for spamfilters (you should read up on Boosting as a method of chaining multiple filters), but the Bayes-attack tool is an interesting approach. I plan to blog about it on http://taint.org/ when I get a chance.
In the meantime, you might like to see what SpamAssassin 2.50 made of it with my personal training set; a square-ish 0.49 with chi2. (I had to make a few mods to your message; namely, I cut and pasted the headers of another 419 spam, and changed the From addr to match the name, in traditional 419 spam style. SpamAssassin doesn't filter messages without headers.) The bayes tokens used in the calculation can be seen in the debug lines like this: debug: bayes token 'captain' => 0.00453686200378072 Based on Bayes alone, this would have been an 'unsure'; with the forged set of headers, SpamAssassin would have caught it with its own rules. (we have a good set of rules that catch most 419s now, since they mostly seem to use one particular spamware tool.) BTW maybe this spammer -- http://taint.org/2003/02/07/141629a.html -- has already read your article? Who'd have thought 'concupiscent' would get a high spamprob for me ;) --j.
Delivered-To: [EMAIL PROTECTED] Received: from localhost (jalapeno [127.0.0.1]) by jmason.org (Postfix) with ESMTP id 002B016F16 for <jm@localhost>; Fri, 7 Feb 2003 15:58:47 +0000 (GMT) Received: from jalapeno [127.0.0.1] by localhost with IMAP (fetchmail-5.9.0) for jm@localhost (single-drop); Fri, 07 Feb 2003 15:58:47 +0000 (GMT) Received: from mail.hivelocity.net (mail.hivelocity.net [65.59.189.58]) by dogma.slashnull.org (8.11.6/8.11.6) with SMTP id h0E4ERv29190 for <[EMAIL PROTECTED]>; Tue, 14 Jan 2003 04:14:31 GMT From: "Mrs. SANDRA MANI" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] X-Mailer: Microsoft Outlook Express 5.00.2919.6900 DM MIME-Version: 1.0 Message-Id: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="us-ascii" Subject: CONFIDENTIAL Date: Fri, 7 Feb 2003 15:23:30 +0000 I mean to be writing you this sensitive message believing that you won't violate the trust I'm to about impose on you. By way of introduction, I am Sandra Mani the wife of Hanis Mani Mani previous chief of defense-staff of (Republic of Guinea Bissau). I heard about you through a good businessman who told me I can freely deal with you that you are truthful and also your good ability of dealing with this. He made me to surmise that you must be such an erudite businessman of ingenuity and much compassion. My loving husband was killed not long ago in an attack last December for his role as a brave subversive rebel captain against the previous evil totalitarian government of guinea Bissau. His sad absence has affected me more then I can express. Subsequent to this political crisis, I was forced to flee to the good land of Cote d'Ivoire for my very life. In Abidjan he stored ONE METALLIC CRATE in a safe storage location. He marked it as an African Artworks as belonging to his American friend who would come with the keys for the claim of the consignment. He did not disclose to the storage people the real contents of the crate. The crate contain almost $ 18,000.000.00. To be truthful with you, this is the only legacy that my husband left for me. I have the proof of ownership and other requisite proofs for that deposit, but I am not an American, which is what they are expecting. I'd like you to behave as the true claiment of the container and claim it for wiring to your Checking for use in your hometown. I've decided to render to you the contribution of 5% of the final quantity and 2% for other mecellaneous expenditures you may cause while you do that. Should you elect to help me, I'll tell you the procedure we should use to make certain the liberation of the cache isn't difficult. It should just be a matter of meeting the formalities. Hopefully. Sandra Mani NB: see that you call me when you get this message for more briefing.
debug: Score set 0 chosen. debug: using "/home/jm/.spamassassin" for user state dir debug: bayes: tie-ing to DB file R/O /home/jm/.spamassassin/bayes_toks debug: bayes: tie-ing to DB file R/O /home/jm/.spamassassin/bayes_seen debug: Score set 2 chosen. debug: using "./rules" for default rules dir debug: using "/etc/mail/spamassassin" for site rules dir debug: using "/home/jm/.spamassassin" for user state dir debug: using "/home/jm/.spamassassin/user_prefs" for user prefs file debug: Initialising learner debug: running header regexp tests; score so far=0 debug: running body-text per-line regexp tests; score so far=3.5 debug: bayes corpus size: nspam = 3941, nham = 19145 debug: tokenize: header tokens for *F = ""Mrs. SANDRA MANI" <[EMAIL PROTECTED]>" debug: tokenize: header tokens for To = "[EMAIL PROTECTED]" debug: tokenize: header tokens for *x = "Microsoft Outlook Express 5.00.2919.6900 DM" debug: tokenize: header tokens for MIME-Version = "" debug: tokenize: header tokens for *m = " 200301140414 h0E4ERv29190 dogma slashnull org " debug: tokenize: header tokens for *c = "/plain; charset="us-ascii"" debug: tokenize: header tokens for *r = " mail.hivelocity.net (mail.hivelocity.net [65.59.189]) by dogma.slashnull.org (8.11.6/8.11.6) <[EMAIL PROTECTED]>; " debug: tokenize: header tokens for *r = " mail.hivelocity.net (mail.hivelocity.net [65.59.189]) by dogma.slashnull.org (8.11.6/8.11.6) <[EMAIL PROTECTED]>; jalapeno [127.0.0] by localhost IMAP (fetchmail-5.9.0) jm@localhost (single-drop); " debug: bayes token 'H*x:5.00.2919.6900' => 0.999 debug: bayes token 'behave' => 0.00222428174235403 debug: bayes token 'captain' => 0.00453686200378072 debug: bayes token 'totalitarian' => 0.00594059405940594 debug: bayes token 'Abidjan' => 0.993013100436681 debug: bayes token 'hometown' => 0.0094488188976378 debug: bayes token 'd'Ivoire' => 0.987596899224806 debug: bayes token 'H*F:Mrs' => 0.987596899224806 debug: bayes token 'formalities' => 0.987596899224806 debug: bayes token 'liberation' => 0.0155844155844156 debug: bayes token 'requisite' => 0.0186046511627907 debug: bayes token 'crate' => 0.0230769230769231 debug: bayes token 'evil' => 0.0289795078191229 debug: bayes token 'Bissau' => 0.97037037037037 debug: bayes token 'Cote' => 0.964149922197416 debug: bayes token 'CONFIDENTIAL' => 0.959005630527377 debug: bayes token 'surmise' => 0.0444444444444444 debug: bayes token 'consignment' => 0.945057374972883 debug: bayes token 'ONE' => 0.943633128358777 debug: bayes token 'i'd' => 0.0595303472728287 debug: bayes: score = 0.496731078390794 debug: using "/home/jm/.spamassassin" for user state dir debug: bayes: untie-ing debug: bayes: untie-ing db_toks debug: bayes: untie-ing db_seen debug: running raw-body-text per-line regexp tests; score so far=4.6 debug: running uri tests; score so far=4.6 debug: uri tests: Done uriRE debug: running full-text regexp tests; score so far=4.6 debug: local tests only, ignoring Pyzor debug: all '*To' addrs: [EMAIL PROTECTED] [EMAIL PROTECTED] debug: all '*From' addrs: [EMAIL PROTECTED] debug: is DNS available? 0 debug: forged_rcvd_trail: entry 0: by=jmason.org from=(undef) mismatches=0 debug: forged_rcvd_trail: entry 1: by=(undef) from=(undef) mismatches=0 debug: running meta tests; score so far=8.3 debug: auto-learn? safety=+/-4, body-hits=1.1, head-hits=7.2 debug: auto-learn: recomputing score based on scoreset 0 debug: Score set 0 chosen. debug: auto-learn: original score: 12.2, recomputed score: 12.174 debug: Score set 2 chosen. debug: auto-learn? no: inside auto-learn thresholds or safety zone around required_hits debug: is spam? score=12.2 required=5 tests=BAYES_44,DATE_IN_FUTURE_96_XX,FORGED_MUA_OUTLOOK,FROM_ENDS_IN_NUMS,RATWARE_OE_MALFORMED,SEMIFORGED_HOTMAIL_RCVD,US_DOLLARS_3 Received: from localhost [127.0.0.1] by jalapeno with SpamAssassin (2.50-cvs 1.167-2003-02-03-exp); Fri, 07 Feb 2003 19:24:12 +0000 From: "Mrs. SANDRA MANI" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: CONFIDENTIAL Date: Fri, 7 Feb 2003 15:23:30 +0000 Message-Id: <[EMAIL PROTECTED]> X-Spam-Flag: YES X-Spam-Status: Yes, hits=12.2 required=5.0 tests=BAYES_44,DATE_IN_FUTURE_96_XX,FORGED_MUA_OUTLOOK, FROM_ENDS_IN_NUMS,RATWARE_OE_MALFORMED, SEMIFORGED_HOTMAIL_RCVD,US_DOLLARS_3 version=2.50-cvs X-Spam-Level: ************ X-Spam-Checker-Version: SpamAssassin 2.50-cvs 1.167-2003-02-03-exp MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----------=_3E4407DC.C8909ED8" This is a multi-part message in MIME format. ------------=_3E4407DC.C8909ED8 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See http://spamassassin.org/tag/ for more details. Content preview: I mean to be writing you this sensitive message believing that you won't violate the trust I'm to about impose on you. By way of introduction, I am Sandra Mani the wife of Hanis Mani Mani previous chief of defense-staff of (Republic of Guinea Bissau). [...] Content analysis details: (12.20 points, 5 required) FROM_ENDS_IN_NUMS (0.6 points) From: ends in numbers RATWARE_OE_MALFORMED (2.9 points) X-Mailer contains malformed Outlook Express version US_DOLLARS_3 (1.1 points) BODY: Nigerian scam key phrase ($NN,NNN,NNN.NN) BAYES_44 (0.0 points) BODY: Bayesian classifier says spam probability is 44 to 50% [score: 0.4967] DATE_IN_FUTURE_96_XX (1.6 points) Date: is 96 hours or more after Received: date SEMIFORGED_HOTMAIL_RCVD (2.1 points) hotmail.com 'From' address, but no 'Received:' FORGED_MUA_OUTLOOK (3.9 points) Forged mail pretending to be from MS Outlook ------------=_3E4407DC.C8909ED8 Content-Type: message/rfc822 Content-Description: original message before SpamAssassin Content-Disposition: inline Content-Transfer-Encoding: 8bit Delivered-To: [EMAIL PROTECTED] Received: from localhost (jalapeno [127.0.0.1]) by jmason.org (Postfix) with ESMTP id 002B016F16 for <jm@localhost>; Fri, 7 Feb 2003 15:58:47 +0000 (GMT) Received: from jalapeno [127.0.0.1] by localhost with IMAP (fetchmail-5.9.0) for jm@localhost (single-drop); Fri, 07 Feb 2003 15:58:47 +0000 (GMT) Received: from mail.hivelocity.net (mail.hivelocity.net [65.59.189.58]) by dogma.slashnull.org (8.11.6/8.11.6) with SMTP id h0E4ERv29190 for <[EMAIL PROTECTED]>; Tue, 14 Jan 2003 04:14:31 GMT From: "Mrs. SANDRA MANI" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] X-Mailer: Microsoft Outlook Express 5.00.2919.6900 DM MIME-Version: 1.0 Message-Id: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="us-ascii" Subject: CONFIDENTIAL Date: Fri, 7 Feb 2003 15:23:30 +0000 I mean to be writing you this sensitive message believing that you won't violate the trust I'm to about impose on you. By way of introduction, I am Sandra Mani the wife of Hanis Mani Mani previous chief of defense-staff of (Republic of Guinea Bissau). I heard about you through a good businessman who told me I can freely deal with you that you are truthful and also your good ability of dealing with this. He made me to surmise that you must be such an erudite businessman of ingenuity and much compassion. My loving husband was killed not long ago in an attack last December for his role as a brave subversive rebel captain against the previous evil totalitarian government of guinea Bissau. His sad absence has affected me more then I can express. Subsequent to this political crisis, I was forced to flee to the good land of Cote d'Ivoire for my very life. In Abidjan he stored ONE METALLIC CRATE in a safe storage location. He marked it as an African Artworks as belonging to his American friend who would come with the keys for the claim of the consignment. He did not disclose to the storage people the real contents of the crate. The crate contain almost $ 18,000.000.00. To be truthful with you, this is the only legacy that my husband left for me. I have the proof of ownership and other requisite proofs for that deposit, but I am not an American, which is what they are expecting. I'd like you to behave as the true claiment of the container and claim it for wiring to your Checking for use in your hometown. I've decided to render to you the contribution of 5% of the final quantity and 2% for other mecellaneous expenditures you may cause while you do that. Should you elect to help me, I'll tell you the procedure we should use to make certain the liberation of the cache isn't difficult. It should just be a matter of meeting the formalities. Hopefully. Sandra Mani NB: see that you call me when you get this message for more briefing. ------------=_3E4407DC.C8909ED8-- This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See http://spamassassin.org/tag/ for more details. Content preview: I mean to be writing you this sensitive message believing that you won't violate the trust I'm to about impose on you. By way of introduction, I am Sandra Mani the wife of Hanis Mani Mani previous chief of defense-staff of (Republic of Guinea Bissau). [...] Content analysis details: (12.20 points, 5 required) FROM_ENDS_IN_NUMS (0.6 points) From: ends in numbers RATWARE_OE_MALFORMED (2.9 points) X-Mailer contains malformed Outlook Express version US_DOLLARS_3 (1.1 points) BODY: Nigerian scam key phrase ($NN,NNN,NNN.NN) BAYES_44 (0.0 points) BODY: Bayesian classifier says spam probability is 44 to 50% [score: 0.4967] DATE_IN_FUTURE_96_XX (1.6 points) Date: is 96 hours or more after Received: date SEMIFORGED_HOTMAIL_RCVD (2.1 points) hotmail.com 'From' address, but no 'Received:' FORGED_MUA_OUTLOOK (3.9 points) Forged mail pretending to be from MS Outlook