This and other RFCs are available on the web at
http://dev.perl.org/rfc/
=head1 TITLE
Consolidate the $1 and C<\1> notations
=head1 VERSION
Maintainer: David Storrs <[EMAIL PROTECTED]>
Date: 28 Sep 2000
Mailing List: [EMAIL PROTECTED]
Number: 331
Version: 1
Status: Developing
=head1 ABSTRACT
Currently, C<\1> and $1 have only slightly different meanings within a
regex. Let's consolidate them together, eliminate the differences, and
settle on $1 as the standard.
=head1 DESCRIPTION
Note: For convenience, I am going to talk about C<\1> and $1 in this RFC.
In actuality, these notations extend indefinitely: C<\1..\n> and
C<$1..$n>. Take it as read that anything which applies to $1 also applies
to C<$2, $3>, etc.
In current versions of Perl, C<\1> means "whatever was matched by the
first set of grouping parens I<in this regex match>." $1 means "whatever
was matched by the first set of grouping parens I<in the previously-run
regex match>." For example:
=over 4
=item *
/(foo)_$1_bar/
=item *
/(foo)_C<\1>_bar/
=back
mean different things: the second will match 'foo_foo_bar', while the
first will match 'foo[SOMETHING]bar' where [SOMETHING] is whatever was
captured in the B<previous> match...which could be a long, long way away,
possibly even in some module that you didn't even realize you were
including (because it was included by a module that was included by a
module that was included by a...).
Probably the primary reason for this distinction is the following:
=over 4
=item *
s/(foo)C<\1>/$1bar/ # changes "foofoo" to "foobar"
=back
The key fact here is that, in the first section of a s/// you are supposed
to use C<\1>, but in the second portion you are supposed to use $1. If
you understand the whole logical structure behind it and understand how an
s/// works (i.e., the right hand side of an s/// is a double-quoted
string, not a regex), you will understand the distinction. For newbies,
however, it is apt to be quite confusing.
Aside from this confusion is the fact that, in general, when you use a
backreference you want it to refer to something that you just
matched...i.e., something from this regex.
To resolve all these issues, let's remove the C<\1> notation and
consolidate meanings as follows:
=over 4
=item *
C<\1> goes away as a special form
=item *
$1 means what C<\1> currently means (first match in this regex)
=item *
${1} is the same as $1 (first match in this regex)
=item *
${P1} means what $1 currently means (first match in last regex)
=back
These changes eliminate a potential source of confusion, retain all
functionality, provide an easy migration path for P526, and the last
notation (${P1}) serves as a clear indicator that you are talking about
something from outside the current regex.
Using this new syntax, you could then write:
=over 4
=item *
s/(foo)$1/$1bar/ # changes "foofoo" to "foobar"
=item *
s/(bar)(bell)/${P1}$2/ # changes "barbell" to "foobell"
=back
=head2 Updating $1...When should it happen?
After a regex is finished, it must update the ${Pn} variables so that the
next match can access them if desired (if we wanted to get really
pathological, we could have multidimensional access such as: ${P2,2}
which is the second capture from the second-to-most-recent regex. This
would seem to be a Bad Idea, however). This should not happen until after
the statement containing the regex is finished, in order that the $1
variables on the right hand side of an s/// will still refer to the
correct things.
=head1 IMPLEMENTATION
This may require significant changes to the regex engine, which is a topic
on which I am not qualified to speak. Could someone with more
knowledge/experience please chime in?
=head1 REFERENCES
RFC 112: Assignment within a regex
RFC 276: Localising Paren Counts in qr()s.
perlre manpage