Simply great .. thanks
On 6/1/07, Paul Lalli <[EMAIL PROTECTED]> wrote:
On Jun 1, 4:54 am, [EMAIL PROTECTED] (Sharan Basappa) wrote:
> I have a script as follows :
>
> $str = "once upon a time
> once upon a time";
> @store = $str =~ m/(once)/g;
> print @store ;
>
> This outputs "onceonce"
> How come regex is searching beyond newline. I thought the search will
> stop after first once.
What led you to believe that? There is nothing in that regex that
says "stop after the first newline"
> When I replace /g with /m, the output I get is "once", but I thought /m will
> tell regex at multiple lines for match.
That is the mnemonic device, yes, but what it actually does is allow
the ^ token to match after a newline and the $ character to match
before a newline, rather than just the beginning and end of string.
So effectively, ^ and $ match the beggining/ending of lines, rather
than strings.
Your regexp does not involve ^ or $, so /m is completely irrelevent.
If you remove the /g modifier, your pattern matches only once.
Regardless of any other modifiers, if you want to search for more than
one occurrence of the pattern, you need the /g modifier.
> Also when I replace /g with /s, I still get output "once"
Again, without the /g modifier, the pattern matches only once. /s is
also irrelevant. While the mnemonic for this one is "single line",
what it actually does is allow the . wildcard to match any character
including the newline. Normally it matches any character except the
newline. Again, you have no . in your pattern, so /s is irrelevant.
> Can someone demystify this for me ?
> Is my assumption that regex will stop after encountering first newline is
> applicable only when dot* type of regex is used ?
Ah. Now I understand your confusion. It is not the regexp that stops
matching. It is the . wildcard. The . does not match a newline
character, unless you provide the /s modifier. Therefore, the string
"onex\ntwox" will match /o(.*)x/ by setting $1 to 'on'. This is what
you've interpreted by "stopping after the first newline". The regexp
engine didn't stop. It's just that the . ran out of sequential
characters that it could match. If you add the /s modifier, then $1
will become "nex\ntwo", because now the . wildcard will match the
newline.
For more info:
perldoc perlretut
perldoc perlre
perldoc perlreref
Hope this helps,
Paul Lalli
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/