I've done quite a lot of thinking about Parrot's rx_compile op, as I was thinking about implementing it. However, I've come to the conclusion that the definition of the op as it stands is too shallow. Please consider this definition and let me know if implementing it would be worth it to Parrot as a whole, or if this is a case of being too generic.
rx_compile out P0, in S0, in I0 Produce an FSA continuation (see fsa_to_continuation) in P0 which matches the regular expression in S0. The syntax of the regular expression is determined by the type value in I0. This type must be a valid type as returned by rx_load_type. This op is simply a combination of rx_to_fsa and fsa_to_continuation rx_to_fsa out P0, in S0, in I0 Pass the regular expression string parameter to the specfied regular expression parser based on the type parameter (as returned from rx_load_type). Returns an FSA PMC. rx_load_type out I0, in S0 Dynamically load an rx compiler by name and set the first parameter to the identifier for that compiler (for use with rx_compile). The default, minimalistic regular expression syntax has identifier 0. The compiler itself must invoke fsa_new at some point in order to generate its return value. fsa_new out P0, in I0, in P1, in P3, in I1 Given inputs: max state, alphabet, transition matrix and start state, produce an FSA output object which implements the requirements. The alphabet can be one of several values, TBD, but noted that "characters" are only one possible type of alphabet. The transition matrix is an array of arrays, each of which contains 3 to 5 values: start state, input range(s), target state and an optional integer set to 1 or 0 to indicate if this is a final state or not and an optional integer set to 0, 1 or 2 to indicate if this state consumes its input and records it (0), consumes its input and does not record it (1) or does not consume the input token. Input ranges are going to be similar values to alphabet. More detail may be added to the transition matrix as needed. Note that target state may end up needing to be either an integer state value or a continuation. fsa_to_continuation out P0, in P1 Take as input a valid FSA and return a continuation which, when invoked, will simulate the FSA on its input parameter, and return an FSA status PMC. If the FSA has never been compiled to Parrot, this op compiles it, otherwise the same continuation is returned. For all intents and purposes, assume that this is a black box, and although in most cases the continuation returned will invoke newly generated bytecode (representing the FSA's state transitions), that need not be the case. fsa_minimize out P0, in P1 Attempt to minimize the FSA and return a new FSA as output. The goal here is to provide a generic FSA mechanism which can accomodate the various regular expression syntaxes as input, or can be generated "by hand" by a compiler writer who wishes to have control over the details (or provide FSA semantics over more complex tokens than just characters). Either way, the result is an FSA which can be executed (aka "simulated") by invoking its continuation. -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback