RFC 331 (v1) Consolidate the $1 and C<\1> notations

Perl6 RFC Librarian Thu, 28 Sep 2000 13:50:29 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Consolidate the $1 and C<\1> notations

=head1 VERSION

  Maintainer: David Storrs <[EMAIL PROTECTED]>
  Date: 28 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number:  331
  Version: 1
  Status: Developing

=head1 ABSTRACT

Currently, C<\1> and $1 have only slightly different meanings within a
regex.  Let's consolidate them together, eliminate the differences, and
settle on $1 as the standard.

=head1 DESCRIPTION

Note:  For convenience, I am going to talk about C<\1> and $1 in this RFC.
In actuality, these notations extend indefinitely:  C<\1..\n> and
C<$1..$n>.  Take it as read that anything which applies to $1 also applies
to C<$2, $3>, etc.


In current versions of Perl, C<\1> means "whatever was matched by the
first set of grouping parens I<in this regex match>."  $1 means "whatever
was matched by the first set of grouping parens I<in the previously-run
regex match>."  For example:

=over 4

=item *
/(foo)_$1_bar/

=item *
/(foo)_C<\1>_bar/

=back

mean different things:  the second will match 'foo_foo_bar', while the
first will match 'foo[SOMETHING]bar' where [SOMETHING] is whatever was
captured in the B<previous> match...which could be a long, long way away,
possibly even in some module that you didn't even realize you were
including (because it was included by a module that was included by a
module that was included by a...). 

Probably the primary reason for this distinction is the following:

=over 4

=item *
s/(foo)C<\1>/$1bar/     # changes "foofoo" to "foobar"

=back

The key fact here is that, in the first section of a s/// you are supposed
to use C<\1>, but in the second portion you are supposed to use $1.  If
you understand the whole logical structure behind it and understand how an
s/// works (i.e., the right hand side of an s/// is a double-quoted
string, not a regex), you will understand the distinction.  For newbies,
however, it is apt to be quite confusing.

Aside from this confusion is the fact that, in general, when you use a
backreference you want it to refer to something that you just
matched...i.e., something from this regex.

To resolve all these issues, let's remove the C<\1> notation and
consolidate meanings as follows:

=over 4

=item *
C<\1> goes away as a special form 

=item *
$1 means what C<\1> currently means (first match in this regex)

=item *
${1} is the same as $1 (first match in this regex)

=item *
${P1} means what $1 currently means (first match in last regex)

=back

These changes eliminate a potential source of confusion, retain all
functionality, provide an easy migration path for P526, and the last
notation (${P1}) serves as a clear indicator that you are talking about
something from outside the current regex.

Using this new syntax, you could then write:

=over 4

=item *
s/(foo)$1/$1bar/                        # changes "foofoo" to "foobar"

=item *
s/(bar)(bell)/${P1}$2/          # changes "barbell" to "foobell"

=back

=head2 Updating $1...When should it happen?

After a regex is finished, it must update the ${Pn} variables so that the
next match can access them if desired (if we wanted to get really
pathological, we could have multidimensional access such as:  ${P2,2}
which is the second capture from the second-to-most-recent regex.  This
would seem to be a Bad Idea, however).  This should not happen until after
the statement containing the regex is finished, in order that the $1
variables on the right hand side of an s/// will still refer to the
correct things.

=head1 IMPLEMENTATION

This may require significant changes to the regex engine, which is a topic
on which I am not qualified to speak.  Could someone with more
knowledge/experience please chime in?

=head1 REFERENCES

RFC 112: Assignment within a regex

RFC 276: Localising Paren Counts in qr()s.

perlre manpage
RFC 331 (v1) Consolidate the $1 and C<\1> notations

Reply via email to