On Mon, 2003-12-08 at 18:14, [EMAIL PROTECTED] wrote: > There's a lot of possibilities: > /V.?i.?a.?g.?r.?a/i will catch things like viagrra > > /(V|\\/)(i|1|l)(a|\@)gr(a|\@)/i will catch leet-isms like \/[EMAIL PROTECTED]@ > (off-hand > I don't know the leet-ish for "g" or "r" > > When these start to get really broad though there is the potential for false > positives
Perhaps not as many false positives as you may think. CMOScript rules are about as broad as they get. Here's how [had to munge the URL for the list] http://sandgnat.com/cmos/cmos.jsp?matchobfuonly=false&words=vi%61gra scored on Bob's corpus as of Nov 28th 2003: LOCAL_OBFU_ONLY_VGR -- 1623s/0h of 58856 corpus LOCAL_OBFU_ONLY_VGR_SUBJ -- 598s/0h of 58856 corpus (Methinks Bob's corpus doesn't contain any legit mail discussing the V-bomb) I've found that very lenient obfu detection rules tend to generate false positives on shorter words ("gave a 5$ donation" ==>A $5<==), on words that are commonly hyphenated (... go on-line to see ... ==>ON-LINE<==) or split in two (... took for ever for it to ... ==>FOR EVER<==), or on words that start or end with the tail or beginning of other words (I click her e-mail link often ==>CLICK HER E<==). -- Chris Thielen Easily generate SpamAssassin rules to catch obfuscated spam phrases: http://www.sandgnat.com/cmos/ ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk