op wrote:
Hello,
Hello,
I thought I had understood regular expression grouping relatively well, untill I ran into the following behavior: perl -e ' my $string = "<div name=\"abcd\" id=\"dontcare\" class=\"testvalue\" lang=\"hindi\">"; while ($string =~ /(lang)|(id)="(\S+)"/g) { print "\$1->|$1|, \$2->|$2|, \$3->|$3|\n"; } ' outputs: $1->||, $2->|id|, $3->|dontcare| $1->|lang|, $2->||, $3->|| Soon after I realised my mistake and replaced (lang)|(id) with (lang|id), getting the ouput I expected: $1->|id|, $2->|dontcare|, $3->|| $1->|lang|, $2->|hindi|, $3->|| However, I still would have expected the first version to capture the same strings, even with the superfluous parentheses. So, I obviously haven't understood as much of regexp capturing as I had hoped, maybe someone could enlighten me on this? I did browse through the parts on capturing/grouping in perlre, perlreref and perlretut, but didn't find anything that would have made me understand this.
Your first regular expression /(lang)|(id)="(\S+)"/ says to match either the pattern /lang/ or the pattern /id="\S+"/. The second one /(lang|id)="(\S+)"/ says to match either the pattern /lang="\S+"/ or the pattern /id="\S+"/. The alternation affects the whole pattern unless it is inside parentheses.
John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/