"Fred I-IS.COM" <[EMAIL PROTECTED]> writes:

> I created a list which might be helpful, using a dictionary I searched for
> letter pairs which did not exist.  I created the following meta rule to
> search for these non-existant pairs, it might do just what you are looking
> for.

Your meta rule seems to work pretty well.

Some issues that might need to be worked out:

 - getting it to work in an internationalized fashion, we could just
   write a rule to be used when the message specifies that it is
   English, when "ok_languages en" is set, or something like that,
   but that is non-optimal

 - false positives are still a bit high:
   - PGP signatures
   - some "legitimate" URLs (Network Solutions unsubscribe URL for
     renewal notices)

Another thing that might work well is instead using an eval test that
counts non-existent pairs.  There are also the triplets and N-gram files
used by the language testing in TextCat.pm -- we could test N-gram
frequency and if the advertized language is well off the language model
for that language, then score a hit.

Some quick results:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
   9810     4814     4996    0.491   0.00    0.00  (all messages)
100.000  49.0724  50.9276    0.491   0.00    0.00  (all messages as %)
  5.902  11.8612   0.1601    0.987   0.90    1.00  T_FVGT_M_MULTI_ODD_3
  9.521  19.0278   0.3603    0.981   0.89    1.00  T_FVGT_M_MULTI_ODD_2
 15.821  30.1413   2.0216    0.937   0.80    1.00  T_FVGT_M_MULTI_ODD_1

slightly revised rule definitions:

------- start of cut text --------------
# Frederic Tarasevicius
# Internet Information Services, Inc.
# From: "Fred   I-IS.COM" <[EMAIL PROTECTED]>
# Message-ID: <[EMAIL PROTECTED]>
# Subject: Re: [SAtalk] Consonant and Vowel Pairs or Sequences
# To: <[EMAIL PROTECTED]>
# Date: Mon, 13 Oct 2003 17:13:31 -0400

body  __OBFU_J  /j[bcfgw]/i
body  __OBFU_OTHER /(?:vj|vk|xj|xk|yy|zf|zj)/i
body  __OBFU_Q0 /[jkpqtvwz]q/i
body  __OBFU_Q1 /q[afhjkmnsy]/i
body  __OBFU_V  /[fgqw]v/i
body  __OBFU_X  /[cgjkqsvz]x/i
body  __OBFU_Z  /[fjkpqx]z/i
meta  T_FVGT_M_MULTI_ODD_1 ((__OBFU_J + __OBFU_OTHER + __OBFU_Q0 + __OBFU_Q1 + 
__OBFU_V + __OBFU_X + __OBFU_Z) > 1)
meta  T_FVGT_M_MULTI_ODD_2 ((__OBFU_J + __OBFU_OTHER + __OBFU_Q0 + __OBFU_Q1 + 
__OBFU_V + __OBFU_X + __OBFU_Z) > 2)
meta  T_FVGT_M_MULTI_ODD_3 ((__OBFU_J + __OBFU_OTHER + __OBFU_Q0 + __OBFU_Q1 + 
__OBFU_V + __OBFU_X + __OBFU_Z) > 3)
------- end ----------------------------


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to