=head1 Title

Recursion in regular expressions

=head1 Abstract

It would be useful to have part or all of a regular expression be
recursive, for example when trying to match a string with
properly-nested parentheses.

=head1 Discussion

Regexps long since gave up the pretense of being "regular". One
feature that I have often wished for is the ability to designate a
recursive section of a regexp, for handling balanced parentheses.

So, for example, it would be nice to specify

             /(        # begin group
              \(       # begin with explicit parenthesis
              [^\(\)]* # any character except parentheses
              (?*)     # new magic meaning "recurse current group"
              [^\(\)]* # any character except parentheses
              \)       # end with explicit parenthesis
              )        # end group          
             /x

which would match "(abc)" or "(a(b)c)" or "a(b(c)(d))e((f)(g))h()"
or even "((((((((((a))))))))))" but not "(a(b)c" or "a)(b"

I suggest the extention (?*) because it seems consistant with the
current extension syntax, and since * implies (at least to me) the
idea that I don't know how many times this will recurse. I suppose we
could use (?+) or (?{1,3}) to allow finer control over the degree of
recursion, although I'd hate to waste two more special characters. :-)

Whatever extension syntax we adopt, it would mean that "at this point,
recursively match the innermost group." We might want to allow modifiers
to allow skipping levels, so that you could write "at this point,
recusively match the second-innermost group" and so on.


Reply via email to