On Fri, 5 Dec 2003, Chris Thielen wrote: > What you're seeing here is the bugfix regarding word boundaries mentioned > on the home page and the version history. I'll explain why it works like > it does. > > Take this simple regex, for example: > /asdf/i > Let's pretend my rules-gen script is much simpler than it is. It generates: > /[EMAIL PROTECTED]/i > This rule matches " ASDF ", "bananasASDF", " @SDF ", "[EMAIL PROTECTED]", etc... > and life is good. > > Now, we add word boundaries to the original rule: > /\basdf/i > Simply adding \b word boundaries to the generated rule gives us: > /[EMAIL PROTECTED]/i > This rule matches " asdf ", "[EMAIL PROTECTED]", etc. Because of the word > boundary, it no longer matches "bananasASDF" which is probably what we > wanted in the first place. However, notice that it also no longer matches > " @sdf ". This is because a space and an @ are both non-word characters > (\W), therefore the \b doesn't match. > > My solution was to split the tokens into word/nonword classes and group > them. The characters in the word character class get the \b word boundary > check, while the non-word character classes simply match regardless of > what's on the other side. > > Makes sense? It really does allow for better matching, methinks.
Hmmm, if I were searching for an obfuscated version of /\basdf/ I wouldn't expect [EMAIL PROTECTED] to match. I think the key is that when you turn a word character into a nonword character, any \b next to it should change to \B, so that " @sdf" would match, but "[EMAIL PROTECTED]" wouldn't. Basically, assert that there the character next to @ is also not a word character. -- Adam Lopresto http://cec.wustl.edu/~adam/ Her hair glistened in the rain like nose hair after a sneeze. (Chuck Smith, Woodbridge) ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk