tom arnall am Montag, 26. Juni 2006 20:42:
[...]
> do you have any idea why:
>
> $_ = " x11x x22x a ";
>
> $re1 = qr/x.*?\d\dx|a/;
> $re2 = qr/($re1\s)?$re1/;
> ($_) = /($re2)/;
> print $_;
>
> doesn't produce 'x11x' ? (note btw that if you insert '\n' between the
> first two tokens of the target string, the result >does become 'x11x'. note
> also that if you drop '|a' from $re1 you also get 'x11x'.)
# Do you mean by this paragraph:
#!/usr/bin/perl
use strict;
use warnings;
sub tst {
my ($prefix, $s, $re1)[EMAIL PROTECTED];
my $re2 = qr/($re1\s)?$re1/;
$s=~/($re2)/ && print "$prefix <$1>\n";
}
tst ('1: ', ' x11x x22x a ', qr/x.*?\d\dx|a/); # orig
tst ('2: ', " x11x \n x22x a ", qr/x.*?\d\dx|a/); # \n
tst ('3: ', ' x11x x22x a ', qr/x.*?\d\dx/); # without |a
# produces:
1: <x11x x22x a>
2: <x11x>
3: <x11x>
# and you wonder why 1: does not match only 'x11x' ?
I try to explain what happens with the matching of 1: - it's not very concise,
and I'm *not* sure if it's correct. Please somebody correct me if I'm wrong:
> i read this example as follows:
>
> $re1 = qr/
> x #find an 'x'
> .*? #find whatever of whatever length
> \d\d #find two digits
> x #find an 'x'
This finds, in the first $re1 part of the below $re2, 'x11x', using the
shortest non greedy interpretation of .*?,
> | #or, instead of all the foregoing,
> a #find an 'a'
so that the above |a alternative has not to be tested anymore.
> /x;
[[Start $re2]]
> $re2 = qr/
> (
> $re1 #find $re1
See comments above: 'x11x' is found,
> \s #and whitespace
and \s too (one of the two \s between 'x11x' and 'x22x').
> )? #or maybe none of the foregoing
Now, we matched 'x11x ', but
> $re1 #find for sure $re1
this 2nd $re1 cannot match anything, because the next unmatched char is \s,
whereas the 2nd $re1 expects an 'x' (or an 'a').
> #in sum, find $re1 possibly preceded by
> $re1+whitespace
Not only that: Yes, the first $re1 is optional, and the second is mandatory;
the match by the first $re1 so far is not valid, because the second can't
match.
Now, *another* match variant with the first $re1 is tried. This is possible
with matching 'x11x x22x ' (the .*? matching '11x x'). And, the 2nd $re1
can match the left over 'a'. $re2 matches the whole string this way.
It seems, with my interpretation, that omitting the ()? would be tried *after*
trying all non-null matches with it, although ()? indicates a minimal match,
and the 2nd $re1 alone *could* match 'x11x' - but that would not be the
maximal possible match with $re2.
I'm a bit confused here. Maybe the reason is that the .*? has "precedence"
over the ()? containing it? [backtracking goes from the inner to the outer?]
> /x;
I'm hoping not augmenting the confusion here... including mine...
Dani
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>