I've done quite a lot of thinking about Parrot's rx_compile op, as I was
thinking about implementing it. However, I've come to the conclusion
that the definition of the op as it stands is too shallow. Please
consider this definition and let me know if implementing it would be
worth it to Parrot as a whole, or if this is a case of being too
generic.

rx_compile out P0, in S0, in I0

        Produce an FSA continuation (see fsa_to_continuation) in P0
        which matches the regular expression in S0. The syntax of the
        regular expression is determined by the type value in I0. This
        type must be a valid type as returned by rx_load_type. This op
        is simply a combination of rx_to_fsa and fsa_to_continuation
        
rx_to_fsa out P0, in S0, in I0

        Pass the regular expression string parameter to the specfied
        regular expression parser based on the type parameter (as
        returned from rx_load_type). Returns an FSA PMC.
        
rx_load_type out I0, in S0

        Dynamically load an rx compiler by name and set the first
        parameter to the identifier for that compiler (for use with
        rx_compile). The default, minimalistic regular expression syntax
        has identifier 0. The compiler itself must invoke fsa_new at
        some point in order to generate its return value.

fsa_new out P0, in I0, in P1, in P3, in I1

        Given inputs: max state, alphabet, transition matrix and start
        state, produce an FSA output object which implements the
        requirements. The alphabet can be one of several values, TBD,
        but noted that "characters" are only one possible type of
        alphabet. The transition matrix is an array of arrays, each of
        which contains 3 to 5 values: start state, input range(s),
        target state and an optional integer set to 1 or 0 to indicate
        if this is a final state or not and an optional integer set to
        0, 1 or 2 to indicate if this state consumes its input and
        records it (0), consumes its input and does not record it (1) or
        does not consume the input token. Input ranges are going to be
        similar values to alphabet. More detail may be added to the
        transition matrix as needed. Note that target state may end up
        needing to be either an integer state value or a continuation.

fsa_to_continuation out P0, in P1

        Take as input a valid FSA and return a continuation which, when
        invoked, will simulate the FSA on its input parameter, and
        return an FSA status PMC. If the FSA has never been compiled to
        Parrot, this op compiles it, otherwise the same continuation is
        returned.
        
        For all intents and purposes, assume that this is a black box,
        and although in most cases the continuation returned will invoke
        newly generated bytecode (representing the FSA's state
        transitions), that need not be the case.

fsa_minimize out P0, in P1

        Attempt to minimize the FSA and return a new FSA as output.

The goal here is to provide a generic FSA mechanism which can accomodate
the various regular expression syntaxes as input, or can be generated
"by hand" by a compiler writer who wishes to have control over the
details (or provide FSA semantics over more complex tokens than just
characters). Either way, the result is an FSA which can be executed (aka
"simulated") by invoking its continuation.

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


Reply via email to