RFC 72 (v4) Variable-length lookbehind.

Perl6 RFC Librarian Sat, 30 Sep 2000 12:43:11 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Variable-length lookbehind. 

=head1 VERSION

  Maintainer: Peter Heslin <[EMAIL PROTECTED]>
  Date: 9 Aug 2000
  Last Modified: 30 Sept 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 72
  Version: 4
  Status: Developing

=head1 ABSTRACT

In Perl6, lookbehind in regular expressions should be extended to permit
not only fixed-length, but also variable-length lookbehind.

=head1 CHANGES

Version 4 is a withdrawal of the originally proposed syntax in favour,
in part, of RFC 317; the present RFC has not been withdrawn, because the
proposal to implement variable-length lookbehind still stands.

Version 3 is a revision to make it clearer that the purpose of the
proposal is to implement a general scheme for variable-length
lookbehind.  The title has been changed to reflect this.

Version 2 of this RFC redirects discussion of this topic to 
[EMAIL PROTECTED]

=head1 DESCRIPTION

The original proposal offered some new syntax as a portmanteau solution
to two very different problems: the lack of variable-length lookbehind
in Perl 5, and the lack of a method other than the monolithic and
inflexible C<study> by which the programmer could pass hints to the
regexp optimizer, based upon her knowledge of the peculiarities of the
target string.  The latter problem has now been addressed much more
cleanly by Hugo van der Sanden in RFC 317, and so I have withdrawn the
originally proposed syntax, and all that remains for this RFC to discuss
is variable-length lookbehind.

=head2 Syntax

I originally proposed new syntax designed to kill two birds with one
stone, but for variable-length lookbehind, the same syntax would serve
as that currently used for fixed-length: (?<=pattern) and (?<!pattern).

=head2 Related Features

The original proposal suggested that it would be nice if C<pos> in list
context were to return the offset into the the target string where
the match begins, as well as where it ends.  Some people seemed to like
this idea, so I am leaving it in, even though it no longer has much to
do with the content of this RFC.

=head1 IMPLEMENTATION

Ilya Zakharevich says it is trivial ;-)  Here is part of his comment on
an earlier version of this RFC:

==================================================================

As I said it for many times: this is absolutely trivial to implement.
First of all, if you agree to rewrite

 (?<= \w\s*\d )         # Semantic X: match "a  1"

as

 (?<= \d\s*\w )         # Semantic Y: match "a  1"

then it is as simple as inserting go-back-by(1) nodes before each node
for \s \d and \w.

And to support the more intuitive ;-) semantic X, the only
more-or-less tricky part is to recursively go through the compile
tree, and put "concatenated" nodes in the opposite order.  A piece of
cake.

==================================================================

The original RFC proposed syntax that would have corresponded to Ilya's
syntax Y, precisely because I thought it would be easier to implement,
but several people objected to this as being obscure, so I changed it to
syntax X, which would mean that Perl and not the programmer would have
to reverse the regexp in order to start matching it in reverse from the
start of the current match.

One thing that should be avoided is to implement lookbehind by starting
at offset 0 in the target string and simply matching forwards from
there.  As Hugo has pointed out, this would throw away the work of the
optimizer and lead to quadratic behaviour.  This is why variable-length
lookbehind has to approach the match in reverse, starting from the
current offset of the start of match.

It would be acceptable if, for simplicity of implementation,
variable-length lookbehind were restricted to a subset of regular
expression syntax, since even that would be an improvement over what we
have now.

=head1 REFERENCES

RFC 317: Access to optimisation information for regular expressions
RFC 72 (v4) Variable-length lookbehind.

Reply via email to