On Fri, 2003-12-05 at 13:22, Chris Thielen wrote:
> Adam D. Lopresto said:
<snip>
> >> My solution was to split the tokens into word/nonword classes and group
> >> them.  The characters in the word character class get the \b word
> >> boundary
> >> check, while the non-word character classes simply match regardless of
> >> what's on the other side.
> >>
> >> Makes sense?  It really does allow for better matching, methinks.
> >
> > Hmmm, if I were searching for an obfuscated version of /\basdf/ I wouldn't
> > expect [EMAIL PROTECTED] to match.  I think the key is that when you turn a
> > word
> > character into a nonword character, any \b next to it should change to \B,
> > so
> > that " @sdf" would match, but "[EMAIL PROTECTED]" wouldn't.  Basically, assert
> > that
> > there the character next to @ is also not a word character.
> 
> Righto, sounds reasonable...  I'll put that on my list of things to do.


FYI, this has been implemented in version 0h, which is available on the 
CMOScript website.  Word boundaries/non-word boundaries are now computed
for both word/non-word characters near \b markers.

I haven't updated the version history because exit0 appears to be down, 
but the main link (http://sandgnat.com/cmos/cmos.pl) or the cgi 
(http://sandgnat.com/cmos/cmos.jsp) should get you version 0h.

-- 
Chris Thielen

Easily generate SpamAssassin rules to catch obfuscated spam phrases:
http://www.sandgnat.com/cmos/



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to