Offer Kaye wrote on 23.03.2005:

>Change your RE to: m#<h1>(.+?)</h1>(.+?)(?=<h1>|$)#gs
>
>In other words, look ahead to either a <h1> or the end of the string
>("$"). I have to admit this problem wasn't as simple as I initially
>thought - I still have no idea why my first guess didn't work:
>m#<h1>(.+?)</h1>(.+?)(?=<h1>)?#gs
>
>Maybe someone with more knowledge of REs can answer?

John W. Krahn wrote on 23.03.2005:

>This should work (untested)
>
>while ($content =~ m#<h1>(.+?)</h1>(.+?)(?=<h1>|\z)#gs) {


Hi,

and thanks. I tried Offer Kaye's first guess, too, and I think I can explain 
why it does not work.

If you make the lookahead optional, the regex will try to match as few 
characters as possible for the second parentheses - and since the lookahead is 
optional, this will be only a single character.

You have to force a positive lookahead assertion to make sure $2 receives 
everything up to either the next <h1> or the end of the string.

So the other suggestion works. Thank you! The reason I had not tried that was 
the wrong assumption that alternations in lookahead/lookbehind assertions had 
to be of the same length, like in (?=abc|def), but not (?=abc|defg). But now I 
remember that the whole lookahead/lookbehind has to be of a fixed length, so 
you cannot use quantifiers.

Thanks again,

Jan
-- 
A common mistake that people make when trying to design something completely 
foolproof is to underestimate the ingenuity of complete fools.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to