Larry Gilson <[EMAIL PROTECTED]> wrote:

> I had the following HTML tag OBFU rule (variant of yours):
>   /(\>|\s)\w{1,5}?\<\/?\s?[\w\s]{6,150}\/?\s?\>\w{1,7}?(\s|\W|\<)/

There's a lot of clutter in that that makes it harder to 
follow.  Let's try paring it down.  First, '<' and '>' are not 
special on their own in regexes, so there's no need to 
backslash them:

/(>|\s)\w{1,5}?<\/?\s?[\w\s]{6,150}\/?\s?>\w{1,7}?(\s|\W|<)/

When you have an alternation -- something like '(a|b|c)' -- 
where all the alternatives are single characters, it's better 
to write it as a character class -- something like '[abc]'.  
Also, '\s' and '<' are both included in '\W', so that last 
alternation is equivalent to just '\W':

/[>\s]\w{1,5}?<\/?\s?[\w\s]{6,150}\/?\s?>\w{1,7}?\W/

Now, nongreedy matching serves no purpose when the thing 
following it can't be matched by the thing being repeated.  In 
this case you have '\w{1,5}?' followed by '<', but '<' can't 
match '\w', so there's no difference between greedy and 
nongreedy matching there.  The matching for the series of '\w' 
characters has to go all the way to the '<' -- it can't stop 
short.  Similarly, the '\W' at the end can never match the '\w' 
preceding it, so that '?' is also pointless:

/[>\s]\w{1,5}<\/?\s?[\w\s]{6,150}\/?\s?>\w{1,7}\W/

That regex is equivalent to your original one, and may help you 
see better why it's not matching as you expect.  It's looking 
for

   a '>' or whitespace character (space, tab, carriage return,
      line feed, form feed),
   followed by 1 to 5 word characters (letters, numbers, and
      underscores),
   followed by '<',
   followed by an optional '/',
   followed by an optional single whitespace character,
   followed by 6 to 150 word or whitespace characters,
   followed by an optional '/',
   followed by an optional single whitespace character,
   followed by '>',
   followed by 1 to 7 word characters,
   followed by a nonword character (anything other than
      letters, numbers, and underscore).

I'm not clear on what you want to match, but that's probably 
not it.

-- 
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to