op wrote:
Hello,

Hello,

I thought I had understood regular expression grouping relatively
well, untill I ran into the following behavior:

perl -e '
my $string = "<div name=\"abcd\" id=\"dontcare\" class=\"testvalue\"
lang=\"hindi\">";
while ($string =~ /(lang)|(id)="(\S+)"/g) {
  print "\$1->|$1|, \$2->|$2|, \$3->|$3|\n";
}
'
outputs:
$1->||, $2->|id|, $3->|dontcare|
$1->|lang|, $2->||, $3->||

Soon after I realised my mistake and replaced
(lang)|(id) with (lang|id),
getting the ouput I expected:
$1->|id|, $2->|dontcare|, $3->||
$1->|lang|, $2->|hindi|, $3->||

However, I still would have expected the first version to capture the
same strings, even with the superfluous parentheses. So, I obviously
haven't understood as much of regexp capturing as I had hoped, maybe
someone could enlighten me on this? I did browse through the parts on
capturing/grouping in perlre, perlreref and perlretut, but didn't find
anything that would have made me understand this.

Your first regular expression /(lang)|(id)="(\S+)"/ says to match either the pattern /lang/ or the pattern /id="\S+"/. The second one /(lang|id)="(\S+)"/ says to match either the pattern /lang="\S+"/ or the pattern /id="\S+"/. The alternation affects the whole pattern unless it is inside parentheses.



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to