Re: Is $1 ever undefined or set to null?

Steve Grazzini Mon, 21 Jul 2003 14:08:01 -0700

On Mon, Jul 21, 2003 at 04:40:19PM -0400, Jeff 'japhy' Pinyan wrote:
> On Jul 21, Jeff 'japhy' Pinyan said:
> 
> >This is the issue.  Why are the $DIGIT variables bound to the block
> >they're in IN TOTALITY, rather than for the life of the execution of the
> >block?
> 
> It's actually slightly more complex than that.
> 
> Here's a piece of code like what Steve wrote:
> 
>   sub match {
>     ($1 + 1) =~ /(\d+)/;
>     print $1;
>     match() if $1 < 2;
>     print $1;
>   }
> 
>   "0" =~ /(\d+)/;
>   match();
> 
> Now, this code prints "1222".  Odd.  Baffling, even.  Why doesn't it print
> "1221"?  Because the $DIGIT variables are not just magically scoped,
> they're magic themselves.  They are connected to the last successful
> pattern match, yes, but more importantly, they are DIRECTLY connected to
> the last PMOP (an internal structure representing the pattern match).


You're explaining this much more clearly than I had done, but 
let me jump in again -- 

The magic regex variables *themselves* live forever and don't obey
any scoping rules.  They don't have to worry about scope, since as
you said, they don't contain any data.  Their values are fetched 
dynamically by looking at the "last match" (which is what I've been
calling PL_curpm, which is the dynamically scoped PMOP pointer).

PL_curpm behaves consistently, although the way it's dynamically
scoped is slightly unusual, as you said.

But the PMOP doesn't contain any data *either*.  It has a pointer
to REGEXP structure, which contains, among other things, the compiled
pattern and what I'll call the "match data".  The match data might
include a copy of the target string and offsets for each pair of
capturing parens, and it can be used to calculate the value of $1
or @- (or a host of other variables) dynamically.

The problem with this set-up is that PL_curpm is dynamically
scoped, but the REGEXP, which contains the data we're interested
in, isn't.

Tying the match data to the compiled pattern (and thence to the
PMOP, for pity's sake) is, arguably, bad design...

You can also see it misbehaving here:

    my $rx = qr/(...)/;   # REGEXP 1

      "foo" =~ /$rx/;     # PMOP 1 / REGEXP 1
    { "bar" =~ /$rx/; }   # PMOP 2 / REGEXP 1

    print $1;   # "bar" 

In this one there are two distinct PMOPs (the m// operations) but
only one REGEXP, which is what we've stored in $rx.

When we print $1 the chain of references looks something like

   $1 -> PL_curpm -> <PMOP #1> -> $rx -> <match data>

[ Really the PMOP points directly to the REGEXP inside $rx. ]

And the match data inside $rx comes from the time it matched "bar",
since there's no mechanism for saving and restoring that kind of
thing.

Anyway, apologies for the blood and perlguts -- 

-- 
Steve

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Is $1 ever undefined or set to null?

Reply via email to