Adam D. Lopresto said:
> On Fri, 5 Dec 2003, Chris Thielen wrote:
>
>> What you're seeing here is the bugfix regarding word boundaries
>> mentioned
>> on the home page and the version history.  I'll explain why it works
>> like
>> it does.
>>
>> Take this simple regex, for example:
>> /asdf/i
>> Let's pretend my rules-gen script is much simpler than it is.  It
>> generates:
>> /[EMAIL PROTECTED]/i
>> This rule matches " ASDF ", "bananasASDF", " @SDF ", "[EMAIL PROTECTED]",
>> etc...
>> and life is good.
>>
>> Now, we add word boundaries to the original rule:
>> /\basdf/i
>> Simply adding \b word boundaries to the generated rule gives us:
>> /[EMAIL PROTECTED]/i
>> This rule matches " asdf ", "[EMAIL PROTECTED]", etc.  Because of the word
>> boundary, it no longer matches "bananasASDF" which is probably what we
>> wanted in the first place. However, notice that it also no longer
>> matches
>> " @sdf ".  This is because a space and an @ are both non-word characters
>> (\W), therefore the \b doesn't match.
>>
>> My solution was to split the tokens into word/nonword classes and group
>> them.  The characters in the word character class get the \b word
>> boundary
>> check, while the non-word character classes simply match regardless of
>> what's on the other side.
>>
>> Makes sense?  It really does allow for better matching, methinks.
>
> Hmmm, if I were searching for an obfuscated version of /\basdf/ I wouldn't
> expect [EMAIL PROTECTED] to match.  I think the key is that when you turn a
> word
> character into a nonword character, any \b next to it should change to \B,
> so
> that " @sdf" would match, but "[EMAIL PROTECTED]" wouldn't.  Basically, assert
> that
> there the character next to @ is also not a word character.

Righto, sounds reasonable...  I'll put that on my list of things to do.

Thanks!


--
Chris Thielen

Easily generate SpamAssassin rules to catch obfuscated spam phrases:
http://www.sandgnat.com/cmos/


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to