On May 28, Bornaz, Daniel said:
>$stt="The food is under the bar in the barn in the river.";
>$stt=~/bar(.*?)river/;
>print "$&";
>
>The output is:
>bar in the barn in the river
>
>Instead of the expected:
>barn in the river
For the meantime, you might like to look at chapter 6 of "Learning Perl's
Regular Expressions"[1], which provides an introduction to backtracking
and greediness.
The section, "Leftmost-Longest", is the problem you've shown here. While
the technique required to find the shortest possible match of all matches
is not yet explained in the book (it will be in chapter 8, which I'll be
working on today or tomorrow), I can tell you about it here.
In order to find the smallest number in a list, you have to go through all
the numbers in the list. The same is true for a finding the shortest
match in a string.
$_ = "the food is at their bar by the barn in a barrel";
my $match;
# goal: find shortest string between 'e' and 'bar'
while (/(?=(e.*?bar))/g) {
my $len = length $1;
$match = $1 if not defined $match or length($match) > $len;
}
The way the approach works is it uses a look-ahead assertion, (?=...).
Look-ahead is JUST LIKE matching, without actually advancing in the
string. It's like having a pair of binoculars with you on a hike.
We can make this approach even more robust, by demanding the regex match
less characters on each successive match:
my $limit = '*';
while (/(?=(e.$limit?bar))/g) {
my $len = length $1;
if (not defined $match or length($match) > $len) {
$match = $1;
$limit = "{0,$len}";
}
}
This approach changes the $limit when a new shortest match is found.
Look forward (no pun intended) to seeing this in chapter 8.
[1] http://www.pobox.com/~japhy/docs/LPRE.html
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
Are you a Monk? http://www.perlmonks.com/ http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
** I need a publisher for my book "Learning Perl's Regular Expressions" **