Advanced regex question - backtracking vs. negative lookaheads

Jeremy Fairbrass Fri, 21 Apr 2006 07:34:48 -0700

Hi all,
I wonder if one of you regex gurus might be able to give me some advice 
regarding the most efficiant way of writing a particular rule....


Let's say I want to use regex to search for the phrase "color:blue" within a 
<span> tag as in the example below (just a made-up example for the sake of 
this question):

<span style="border:0px; color:blue; font-size:small">

In this case, the "color:blue" part is preceeded by some other text 
("border:0px") after the first quote mark, but that preceeding text could in 
fact be anything, and I want to allow for the fact that it could be 
anything.

I've read at http://www.regular-expressions.info that it's best to avoid 
backtracking if possible because that is resource-intensive.

So one possible solution would be the following:

/style="(.(?!color))+.color:blue/

In other words, after the first " (quote mark) it looks for any character 
NOT followed by the word "color", and repeats that with the + character, 
until it gets to the actual word "color". I believe this results in no (or 
almost no?) backtracking. But I'm not sure if it's resource-intensive 
anyway, because of the negative lookahead - are negative lookaheads 
particularly resource intensive, when compared to backtracking? Is one 
preferable over the other?

An alternative solution would be this:

/style="[^>]+color:blue/

But this will certainly involve some backtracking, especially if there is 
even more text after the "color:blue" but before the closing > character, 
for example the "font-size:small" text.

So what do you think?! Which way is best, ie. most efficient or least 
resource-intensive?

Cheers,
Jeremy

Advanced regex question - backtracking vs. negative lookaheads

Reply via email to