Re: RFC 150 (v1) Extend regex syntax to provide for return of a hash of matched subpatterns

Hugo Mon, 11 Sep 2000 17:53:05 -0700
Richard Proctor writes:
:On Fri 08 Sep, Kevin Walker wrote:
[...]
:> Tom's comment points out a shortcoming in the original RFC:  There's 
:> no way to make, by name, a backref to a named group.  I propose to 
:> fix that in a revised version of RFC 150.  I don't have strong 
:> feelings about what the syntax should be.  Here one idea:
:> 
:>    The substring matched by (?%some_name: ... ) can be referred to as 
:> $%{some_name}.
:> 
:> That's kind of ugly, so other suggestions are welcome.  (The idea was 
:> to do something analogous to $1, $2, etc.  Unfortunately ${some_name} 
:> is already taken.  Maybe $_{some_name} would also work -- though if 
:> %_ seems too valuable to use for this limited purpose.)
[...]
:My thoughts on the back references would be, that if a variable is used
:again later in the regex, assignment takes place and it is simply refered
:to.
:
:Thus $string =~ m#<(?$foo=\w+).*?</$foo>#;
:
:The parse notices the reuse of $foo and performs the actual assigment
:as and when the foo is matched (or at least acts as if it does).

It is unfortunate that '\' is currently used to refer back to backrefs,
since it is difficult to see how you might extend it to handle either
if these cases in such a way that it does not clash with other existing
usage (particularly the /\Q$var\E/ idiom, which escapes every non-alpha
in the string).

(?\$foo) and (?\%some_name) are just about plausible, though ugly:

  print $1 if "foo" =~ /(?$foo=.)(?\$foo)/;         # prints 'o'
  print +{"foo" =~ /(?%foo=.)(?\%foo)/}->{foo}; # prints 'o'

I think this RFC also needs to specify (as RFC 112 already does)
whether the named captures are _also_ to count as normal backrefs.
Ie, what

  "ab ab foo" =~ /^(?%foo=a)(b).*?(\1)/

would match. My guess is that the backrefs should match the list
that would be returned, in which case the above would have to
return the list ('foo', 'a', 'b', 'foo'), but that may seem rather
unintuitive. I don't see an alternative definition that doesn't
lead to other (probably worse) problems; certainly there is room
for lots of confusion here.

One approach to reducing confusion would be to disallow mixing:
if you use (?%name) in a regexp, you may not also use normal paren
backrefs, nor refer back to them with numbers. That may be too
restrictive and kill the utility of the things in the first place,
of course.

Hugo
Re: RFC 150 (v1) Extend regex syntax to provide for return of a hash of matched subpatterns

Reply via email to