Re: RFC 274 (v1) Generalised Additions to Regexs

Richard Proctor Wed, 27 Sep 2000 06:49:25 -0700



> In <[EMAIL PROTECTED]/, Perl6 RFC
> Librarian writes:
> :Given that expansion of regexes could include (+...) and (*...) I
> :have been thinking about providing a general purpose way of adding
> :functionality.  Hence I propose that the entire (+...) syntax is
> :kept free from formal specification for this. (+ = addition)
> :
> :A module or anything that wants to support some enhanced syntax
> :registers something that handles "regex enhancements".
> :
> :At regex compile time, if and when (+foo) is found perl calls
> :each of the registered regex enhancements in turn, these:
> :
> :1) Are passed the foo string as a parameter exactly as is.  (There
> :is an issue of actually finding the end of the generic foo.)
> :
> :2) The regex enhancement can either recognise the content or not.
>
> Is this the right approach? If more than one callback is registered,
> this seems likely to lead to results dependent on the order of
> registration.

Maybe, maybe not.  Does a newer localised definition replace the older
one?  The handling of multiple regestrations has to be resolved.
My initial thoughts are that a "Last registered is checked first"
approach may be best.

>
> I'd be more inclined to have callbacks registered for a word: that
> way we can complain earlier when two modules try to register the
> same word. Then at regexp-compile time we parse out the word
> following the (+ and immediately know who to pass it to (or fail).

This is equally possible, my thoughts where to leave the syntax
completely open so that anything could be added - words, chinese,
$$$.  And leave it to the enhancements to recognise it or not.  I
could add this as an alternative for V2.

>
> :5) if an enhancement recognises the content it could do either of:
> :
> :a) return replacement expanded regex using existing capabilities
> :perl will then pass this back through the regex compiler.
>
> Can we/should we detect (+...) loops? Or are you suggesting that the
> returned string should not permit (+...) expansion?
>

Should we detect? Probably not.  Should we allow definately yes.  The
only grounds for detection are to report infinite recursion.

> :b) return a coderef that is called at run time when the regex gets
> :to this point.
>
> Ok.
>
> :  The referenced code needs to have enough access to the regex
> :internals to be able to see the current sub-expression, request
> :more characters ,access to relevant flags and visability of
> :greediness.
>
> I don't see that this is a good idea; it makes more sense to me that
> the coderef is treated exactly as if it had been compiled from (?{...}).

Lets look at these one at a time:

Access to subexpresions - ok this can be done.

Visability of flags - Not curently possible. The code might
like to know that /i is in effect, it might want to know that /s is
in effect it probably does not need to know about /o.  This is equally
true to the enhancement regex handler that looks at the (+foo) in the
first place.  I think that these could be of use to (?{...}) code.

Greediness - maybe not necessary, but I think better visability of
internals might be beneficial.

>
> :Following on, if (?{...}) etc code is evaluated
> :in forward match, it would be a good idea to likewise support some
> :code block that is ignored on a forward match but is executed when the
> :code is unwound due to backtracking.
>
> The support in (?{...}) for localisation is (as I understand it) the
> intended mechanism for permitting such effects. Can you describe some
> specific problems you are trying to solve here?

Is localisation enough?  It might be, it might be nicer however to
provide a more explicit mechanism to handle more complex cases.

>
> Hugo
>

Richard
Re: RFC 274 (v1) Generalised Additions to Regexs

Reply via email to