On Thu, Oct 20, 2005 at 09:14:15PM -0400, John Adams wrote:
> From: Luke Palmer <[EMAIL PROTECTED]>
> 
> > But $1 in Perl 5 wasn't the same as $1 in a shell script.
> 
> I'm all for breaking things that need breaking, which is why I 
> keep my mouth shut most of the time--either I see the reason or 
> I suspect (that is, take on faith, which is okay by me) there's 
> a reason I don't see or fully understand. I'm just not seeing a 
> compelling reason for this one, and a pretty good reason not to do it: 
I can state the compelling reason for this one -- it's way too 
confusing when $1, $2, $3, etc. correspond to $/[0], $/[1], $/[2], etc.

In many discussions of capturing semantics earlier in the year, 
nearly everyone using $1, $2, $3 in examples, documentation, and 
discussion was having trouble with off-by-one errors.  This includes
the language designers, and even those who were advocating staying
with $1, $2, $3.  Once we switched to using $0, $1, $2, etc., 
nearly all of the confusion and mistakes disappeared.

> I'm not aware offhand of any other place where $0 is used in 
> regex matching, and several of the languages which you point out 
> are zero-based in other places are not zero-based in regex matching.

Yes, but none of those other regex matching languages do nested
captures either.  In particular, a rule like:

    /:w ( (\w+) = (\d+) ; )+ /

no longer captures to $1, $2, $3, or even to $0, $1, $2.  It now
creates an array in $/[0] (aka $0), and each element of that array 
contains a [0] and [1] index representing the second and third set of 
parentheses in the rule.  That is

    "a=4; b=2; c=8;" ~~ /:w ( (\w+) = (\d+) ; )+ /

results in

    $/[0][0][0] == 'a'   $/[0][0][1] == '4'
    $/[0][1][0] == 'b'   $/[0][1][1] == '2'
    $/[0][2][0] == 'c'   $/[0][2][1] == '8'

Trying to make *all* of these indexes 1-based leads to 
chaos (especially wrt array assignment), and saying that top
level parens in a rule are named $1, $2, $3, ... while nested parens 
are named [0], [1], [2], ... just throws everything and
everyone off.  It's *much* easier when everything is zero-based,
even for those who are used to using $1, $2, $3 in regular
expressions.

Pm

Reply via email to