"Fred I-IS.COM" <[EMAIL PROTECTED]> writes: > I created a list which might be helpful, using a dictionary I searched for > letter pairs which did not exist. I created the following meta rule to > search for these non-existant pairs, it might do just what you are looking > for.
Your meta rule seems to work pretty well. Some issues that might need to be worked out: - getting it to work in an internationalized fashion, we could just write a rule to be used when the message specifies that it is English, when "ok_languages en" is set, or something like that, but that is non-optimal - false positives are still a bit high: - PGP signatures - some "legitimate" URLs (Network Solutions unsubscribe URL for renewal notices) Another thing that might work well is instead using an eval test that counts non-existent pairs. There are also the triplets and N-gram files used by the language testing in TextCat.pm -- we could test N-gram frequency and if the advertized language is well off the language model for that language, then score a hit. Some quick results: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 9810 4814 4996 0.491 0.00 0.00 (all messages) 100.000 49.0724 50.9276 0.491 0.00 0.00 (all messages as %) 5.902 11.8612 0.1601 0.987 0.90 1.00 T_FVGT_M_MULTI_ODD_3 9.521 19.0278 0.3603 0.981 0.89 1.00 T_FVGT_M_MULTI_ODD_2 15.821 30.1413 2.0216 0.937 0.80 1.00 T_FVGT_M_MULTI_ODD_1 slightly revised rule definitions: ------- start of cut text -------------- # Frederic Tarasevicius # Internet Information Services, Inc. # From: "Fred I-IS.COM" <[EMAIL PROTECTED]> # Message-ID: <[EMAIL PROTECTED]> # Subject: Re: [SAtalk] Consonant and Vowel Pairs or Sequences # To: <[EMAIL PROTECTED]> # Date: Mon, 13 Oct 2003 17:13:31 -0400 body __OBFU_J /j[bcfgw]/i body __OBFU_OTHER /(?:vj|vk|xj|xk|yy|zf|zj)/i body __OBFU_Q0 /[jkpqtvwz]q/i body __OBFU_Q1 /q[afhjkmnsy]/i body __OBFU_V /[fgqw]v/i body __OBFU_X /[cgjkqsvz]x/i body __OBFU_Z /[fjkpqx]z/i meta T_FVGT_M_MULTI_ODD_1 ((__OBFU_J + __OBFU_OTHER + __OBFU_Q0 + __OBFU_Q1 + __OBFU_V + __OBFU_X + __OBFU_Z) > 1) meta T_FVGT_M_MULTI_ODD_2 ((__OBFU_J + __OBFU_OTHER + __OBFU_Q0 + __OBFU_Q1 + __OBFU_V + __OBFU_X + __OBFU_Z) > 2) meta T_FVGT_M_MULTI_ODD_3 ((__OBFU_J + __OBFU_OTHER + __OBFU_Q0 + __OBFU_Q1 + __OBFU_V + __OBFU_X + __OBFU_Z) > 3) ------- end ---------------------------- ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk