On Thu, Oct 20, 2005 at 09:14:15PM -0400, John Adams wrote: > From: Luke Palmer <[EMAIL PROTECTED]> > > > But $1 in Perl 5 wasn't the same as $1 in a shell script. > > I'm all for breaking things that need breaking, which is why I > keep my mouth shut most of the time--either I see the reason or > I suspect (that is, take on faith, which is okay by me) there's > a reason I don't see or fully understand. I'm just not seeing a > compelling reason for this one, and a pretty good reason not to do it:
I can state the compelling reason for this one -- it's way too confusing when $1, $2, $3, etc. correspond to $/[0], $/[1], $/[2], etc. In many discussions of capturing semantics earlier in the year, nearly everyone using $1, $2, $3 in examples, documentation, and discussion was having trouble with off-by-one errors. This includes the language designers, and even those who were advocating staying with $1, $2, $3. Once we switched to using $0, $1, $2, etc., nearly all of the confusion and mistakes disappeared. > I'm not aware offhand of any other place where $0 is used in > regex matching, and several of the languages which you point out > are zero-based in other places are not zero-based in regex matching. Yes, but none of those other regex matching languages do nested captures either. In particular, a rule like: /:w ( (\w+) = (\d+) ; )+ / no longer captures to $1, $2, $3, or even to $0, $1, $2. It now creates an array in $/[0] (aka $0), and each element of that array contains a [0] and [1] index representing the second and third set of parentheses in the rule. That is "a=4; b=2; c=8;" ~~ /:w ( (\w+) = (\d+) ; )+ / results in $/[0][0][0] == 'a' $/[0][0][1] == '4' $/[0][1][0] == 'b' $/[0][1][1] == '2' $/[0][2][0] == 'c' $/[0][2][1] == '8' Trying to make *all* of these indexes 1-based leads to chaos (especially wrt array assignment), and saying that top level parens in a rule are named $1, $2, $3, ... while nested parens are named [0], [1], [2], ... just throws everything and everyone off. It's *much* easier when everything is zero-based, even for those who are used to using $1, $2, $3 in regular expressions. Pm