Re: RFC 274 (v1) Generalised Additions to Regexs

Hugo Thu, 28 Sep 2000 16:03:59 -0700
In <[EMAIL PROTECTED]>, "Richard Proctor" writes:
:> I'd be more inclined to have callbacks registered for a word: that
:> way we can complain earlier when two modules try to register the
:> same word. Then at regexp-compile time we parse out the word
:> following the (+ and immediately know who to pass it to (or fail).
:
:This is equally possible, my thoughts where to leave the syntax
:completely open so that anything could be added - words, chinese,
:$$$.  And leave it to the enhancements to recognise it or not.  I
:could add this as an alternative for V2.

Well, there are limits to what we can handle - earlier, the parser
will have had to be able to determine where the end of the regexp
is. Even specifying a word at the beginning doesn't help: we need
to know whether the rest should look like a regexp, or code, or
whatever else. The regexp compiler doesn't get a look in until
after that has been done.

Which suggests that maybe each callback - whether or not we link
them to words - should specify what it will match, which suggests
it should be linked with a regexp. And that brings us round to
this message from Larry:

  http://www.mail-archive.com/perl6-language%40perl.org/msg02955.html

.. which made me go all quivery when I read it. :}

:> :5) if an enhancement recognises the content it could do either of:
:> :
:> :a) return replacement expanded regex using existing capabilities
:> :perl will then pass this back through the regex compiler.
:>
:> Can we/should we detect (+...) loops? Or are you suggesting that the
:> returned string should not permit (+...) expansion?
:
:Should we detect? Probably not.  Should we allow definately yes.  The
:only grounds for detection are to report infinite recursion.

Ok.

:> :  The referenced code needs to have enough access to the regex
:> :internals to be able to see the current sub-expression, request
:> :more characters ,access to relevant flags and visability of
:> :greediness.
:>
:> I don't see that this is a good idea; it makes more sense to me that
:> the coderef is treated exactly as if it had been compiled from (?{...}).
:
:Lets look at these one at a time:
:
:Access to subexpresions - ok this can be done.
:
:Visability of flags - Not curently possible. The code might
:like to know that /i is in effect, it might want to know that /s is
:in effect it probably does not need to know about /o.  This is equally
:true to the enhancement regex handler that looks at the (+foo) in the
:first place.  I think that these could be of use to (?{...}) code.
:
:Greediness - maybe not necessary, but I think better visability of
:internals might be beneficial.

Hm, I do appreciate the problem - I wasn't too happy when I realised
that embedded qr{} expressions are protected from the flags of their
outer regexp, cos I wanted to specify /i on the outside and have it
trickle in to the rest. It feels like its going to get real messy,
though, and totally screw the optimiser.

:
:>
:> :Following on, if (?{...}) etc code is evaluated
:> :in forward match, it would be a good idea to likewise support some
:> :code block that is ignored on a forward match but is executed when the
:> :code is unwound due to backtracking.
:>
:> The support in (?{...}) for localisation is (as I understand it) the
:> intended mechanism for permitting such effects. Can you describe some
:> specific problems you are trying to solve here?
:
:Is localisation enough?

Enough to achieve everything you might want to? Yes: you can always
have a (?{ local $a = new Object }) with a DESTROY method. It may not
necessarily be the cleanest possible way to write everything, though.

Hugo
Re: RFC 274 (v1) Generalised Additions to Regexs

Reply via email to