Thanks guys for the clarifications! My understanding of how regex worked was the same as Bowie's, ie: ----- > My understanding is that with [^"]+ the engine will scan from left to > right until it finds a quote. Then, in the context of the previous > regex, it will start backtracking to find a match for "color:blue". -----
I use the free Regex Coach tool from http://www.weitz.de/regex-coach/ to test my regex, and it works the way Bowie described above, ie. using backtracking. In other words, using: /style="[^>]+color:blue/ ...the [^>]+ causes the regex to go all the way to the closing > character, then backtracks until it finds the "color:blue" part. This also agrees with what is explained at www.regular-expressions.info which I believe is a reliable guide to Perl regex. Also, Bowie suggested using laziness instead: /style="[^"]+?color:blue/ But I believe laziness also uses backtracking, so I'm not sure there is *much* of an advantage of this over the greedy regex shown above. Probably the main advantage of the lazy version would be if there was little or no text between the first quote-mark and the "color:blue" part, and/or lots of text between "color:blue" and the last quote-mark, eg: <span style="color:blue; font-size:small; border:0px"> ...The regex would hit this much quicker using the lazy version than the greedy version. But I'm not sure if there really is a difference, especially if I want to be able to hit on SPAN tags that might have more text before the "color:blue" OR might have more text afterwards. Probably it's six of one and half a dozen of the other, right?! Why did David describe the lazy version as "slightly less good" than the greedy version? Incidentally the reason I used [^>]+ rather than [^"]+ was to prevent it from using lots of memory if there was no closing quote - as an alternative to using {1,20}. In any case, both Bowie and David agree that my first solution using (.(?!color))+ is a really bad idea, and that was the main thing I wanted to know! :) Thanks, Jeremy "Bowie Bailey" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > David Landgren wrote: >> Bowie Bailey wrote: >> >> [...] >> >> > > An alternative solution would be this: >> > > >> > > /style="[^>]+color:blue/ >> > >> > This looks better. It is probably less resource-intensive than >> > your previous attempt and is definitely easier to read. But why >> > are you looking for > when you anchor the beginning with a quote? >> > >> > How about this: >> > >> > /style="[^"]+?color:blue/ >> > >> > This is also non-greedy, so it will start looking for the >> > "color:blue" match at the beginning of the string instead of >> > having the + slurp up everything up to the quote and then >> > backtracking to find the match. >> >> The regexp engine doesn't slurp. It just scans from left to right, >> noting "I might have to come back here" along the way. > > Ok, so "slurp" was a bit of a simplification. :) > > My understanding is that with [^"]+ the engine will scan from left to > right until it finds a quote. Then, in the context of the previous > regex, it will start backtracking to find a match for "color:blue". > > In any case, with the non-greedy quantifier, it will stop looking when > it finds the first "color:blue" string instead of continuing to the > end of the string. > > -- > Bowie >