On Sun, 30 Nov 2008 02:51:57 +0200, Canol Gökel wrote: > My problem is to match HTML tags with RegExp. I managed to match > something like this, properly: > > la la la <p>a paragraph</p> bla bla bla <p>another paragraph</p> ya ya > ya > > But when nested, there arises problems: > > <p>a paragraph <p>bla bla bla</p> la la la</p> > > It matches > > <p>A paragraph <p>bla bla bla</p> > > instead of matching the most inner part: > > <p>bla bla bla</p> > > How can one write an expression to match always the most inner part? I > couldn't write an expression like "match a non-greedy <p>.*</p> which > does not have a <p> inside.
Here is the pattern: (<p>(?:.(?!<p>))*?</p>) $ cat /tmp/foo #!/usr/local/bin/perl use strict; use warnings; # print "Perl version $]\n"; $_ = do { local $/; <DATA> }; m{ ( # start capturing <p> # match an opening tag (?: . # match a character (?!<p>) # not followed by opening tag )*? # nongreedily </p> # match a closing tag ) # end capturing }xs and print "Matched: $1\n"; __END__ Outermost: <p> Middle: <p> Inner: <p> Content </p> Trailing </p> Trailing </p> Finished $ /tmp/foo Matched: <p> Content </p> -- Peter Scott http://www.perlmedic.com/ http://www.perldebugged.com/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/