:=item *
:/(foo)_$1_bar/
:
:=item *
:/(foo)_C<\1>_bar/
Please don't do this: write C</(foo)_\1_bar/> or /(foo)_\1_bar/, but
don't insert C<> in the middle: that makes it much more difficult to
read.
:mean different things: the second will match 'foo_foo_bar', while the
:first will match 'foo[SOMETHING]bar' where [SOMETHING] is whatever was
should be: foo_[SOMETHING]_bar
:captured in the B<previous> match...which could be a long, long way away,
:possibly even in some module that you didn't even realize you were
:including (because it was included by a module that was included by a
:module that was included by a...).
This seems a bit unfair. It is just another variable. Any variable
you include in a pattern, you are assumed to know that it contains
the intended value - there is nothing special about $1 in this regard.
:The key fact here is that, in the first section of a s/// you are supposed
:to use C<\1>, but in the second portion you are supposed to use $1. If
:you understand the whole logical structure behind it and understand how an
:s/// works (i.e., the right hand side of an s/// is a double-quoted
:string, not a regex), you will understand the distinction. For newbies,
:however, it is apt to be quite confusing.
I think the whole idea that the LHS of s/// is a pattern, but the
RHS is a string (module /e, of course) is apt to be confusing when
you first encounter it. You won't be able to make sense of any but
the simplest use of s/// until you understand it, I think, and the
documentation expresses it quite clearly.
:=item *
:${P1} means what $1 currently means (first match in last regex)
Do you understand that this is the same variable as $P1? Traditionally,
perl very rarely coopts variable names that start with alphanumerics,
and (off the top of my head) all the ones it does so coopt are letters
only (ARGV, AUTOLOAD, STDOUT etc). I think we need better reasons to
extend that to all $P1-style variables.
If you are suggesting that they should have a special meaning only
in regexps, and only if braced, then I'd find it even more confusing.
The use of braces is usually the easiest (and only?) way to split
out a variable from following alphanumerics:
/foo${P1}bar/
:These changes eliminate a potential source of confusion, retain all
:functionality, provide an easy migration path for P526, and the last
:notation (${P1}) serves as a clear indicator that you are talking about
:something from outside the current regex.
What is the migration path for existing uses of $P1-style variables?
:=item *
:s/(bar)(bell)/${P1}$2/ # changes "barbell" to "foobell"
Note that in the current regexp engine, ${P1} has disappeared by the
time matching starts. Can you explain why we need to change this?
Note also that if you are sticking with ${P1} either we need to
rename all existing user variables of this form, or we can no longer
use the existing 'interpolate this string' (or eval, double-eval etc)
routines, and have to roll our own for this (these) as well.
:=head1 IMPLEMENTATION
:
:This may require significant changes to the regex engine, which is a topic
:on which I am not qualified to speak. Could someone with more
:knowledge/experience please chime in?
Currently the regexp compiler is handed a string in which $variables
have already interpolated. We'd need to avoid that and get either
the the raw data for the string or some list that has undergone a
minimum of preparation. It is possible we need that anyway - it is
a prerequisite for some of the other proposed enhancements (such as
the meta-referred-to RFC 112) and would certainly make the regexp
engine more flexible - but it is certainly substantial work. I don't
know what gotchas may arise. In general it seems a shame to recreate
large parts of the existing string parsing/interpolation code, but
it may not be possible to avoid it.
Changing the lifetime of backreferences feels likely to be difficult,
but it isn't clear to me what you are trying to achieve here. I think
you at least need to add an example of how it would act under s///g
and s///ge.
:=head1 REFERENCES
:
:RFC 112: Assignment within a regex
:
:RFC 276: Localising Paren Counts in qr()s.
I didn't see a mention of these in the body of the proposal.
To me, the prime issue is with \1. The backslash is heavily overloaded
in perl, and that makes it difficult to suggest a consistent and
legible extension that would allow us to refer back to either variables
(RFC 112) or hash keys (RFC 150). I don't think switching to $1 is any
help for those, though.
Hugo