On Mon, 2003-12-08 at 18:14, [EMAIL PROTECTED] wrote:
> There's a lot of possibilities:
> /V.?i.?a.?g.?r.?a/i will catch things like viagrra
> 
> /(V|\\/)(i|1|l)(a|\@)gr(a|\@)/i will catch leet-isms like \/[EMAIL PROTECTED]@ 
> (off-hand
> I don't know the leet-ish for "g" or "r"
> 
> When these start to get really broad though there is the potential for false
> positives

Perhaps not as many false positives as you may think.

CMOScript rules are about as broad as they get.  Here's how [had to
munge the URL for the list]
http://sandgnat.com/cmos/cmos.jsp?matchobfuonly=false&words=vi%61gra
scored on Bob's corpus as of Nov 28th 2003:

LOCAL_OBFU_ONLY_VGR -- 1623s/0h of 58856 corpus
LOCAL_OBFU_ONLY_VGR_SUBJ -- 598s/0h of 58856 corpus

(Methinks Bob's corpus doesn't contain any legit mail discussing the
V-bomb)

I've found that very lenient obfu detection rules tend to generate false
positives on shorter words ("gave a 5$ donation" ==>A $5<==), on words
that are commonly hyphenated (... go on-line to see ... ==>ON-LINE<==)
or split in two (... took for ever for it to ... ==>FOR EVER<==), or on
words that start or end with the tail or beginning of other words (I
click her e-mail link often ==>CLICK HER E<==).

        
-- 
Chris Thielen

Easily generate SpamAssassin rules to catch obfuscated spam phrases:
http://www.sandgnat.com/cmos/



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to