Angel Faus:
# ># I have been uncapable of expressing nested groups or
# ># alternation with your model, and I would say that this
# ># is because the engine needs some way to save not only
# ># the index into the string, but also the point of the
# ># regex where it can branch on a backtack.
#
# >I've been a bit worried that this might be the case.  The
# best solution
# >I've been able to think of is to push a "mark" onto the
# stack, like what
# >Perl 5 does with its call stack to indicate the end of the current
# >function's arguments.  If a call to rePopindex popped a
# mark, it would
# >be considered to have failed, so it would branch to $1 if it had a
# >parameter.
#
# I am not sure I understand it fully. If I get it right an
# action can branch
# if:
#
#   a) A submatch fails.
#   b) A popIndex fails, because there is a mark set

An action can branch if it fails.  For most ops, 'fail' means that it
didn't match; for rePopindex, 'fail' means that it popped a mark instead
of an index.  rePopindex already has a failure mode: if there isn't
anything left on the stack to pop, rePopindex jumps to its parameter if
it has one.  I'm just changing that failure mode.

# It looks to me a bit too confusing, both from the developer
# and the compiler
# point of view. Maybe is just a matter of taste, but isn't it
# going to be
# more complex for the compiler?...

It just means you have to be more explicit.  I consider that a Good
Thing--Perl 5's regular expressions are compact enough to be represented
like:

   1: EXACT <f>(3)
   3: STAR(6)
   4:   EXACT <o>(0)
   6: EXACT <bar>(8)
   8: END(0)

but the internals to support them are an absolute jungle.  I'd rather
have a few exposed ops than have a chunk of code like Perl 5's regular
expressions.  Besides, being able to /see/ where we jump to on each op
when we fail will help when debugging the RE compiler and such.

How do you plan to support lookahead?  Unless I'm mistaken, with my
proposal it goes something like:

        rePushindex
        rePushmark
        #lookahead code in here
        rePopmark
        rePopindex

There are advantages and disadvantages to both proposals.  The question
is, which one (or both) is flexible enough to do what we want it to do?

# > .. a larger example ..
# >     reLiteral "b"   #and what if this fails?
#
# It just branches to $fail, because that's the address that was set in
# OnFail.

But I pushed $continue onto the stack just a few lines up...

# >However, this may not be a good example, as I'm seriously
# looking at the
# >possibility of making reAdvance independent of the stack
# >(cur_re->startindex or something) to ease implementation of reSubst
# >(substitution) and related nonsense.  Here's a better example:
# >
#
# I was thinking in a different way of implementing substitution:
#
# On the cur_re struct you would you store all the info about the match:
# index of the first letter of the match, index of the last
# letter of the
# match, info about the position of the capturing groups, etc..
#
# Then, after the regexp has matched, you simply read this
# structure and apply
# the substitution to the string.
#
# From the engine, the only thing that you need to do is have a
# pair of ops
# that store the indexes. Something like this:
#
# reInitGroup i
#     Saves the current index position as the start position of
# the i group.
# reEndGroup i
#     Saves the current index position as the end position of
# the i group.
#
# The whole match would be capturing group 0.
#
# So a full match with substitution looks like:
#
# RE:
#       reOnFail $fail
#       rePushIndex $advance
#       reInitGroup 0
#       reLiteral "literal"
#       reEndGroup 0
#       set I0, 1
#       reSubst "substitution", S1 ## probably not here, but on
# the caller.
#       reFinished
# $advance:
#       reAdvance
# $fail:
#       set I0, 0
#       reFinished

I was thinking of extending the re_info struct to include another string
field.  Then, when you hit reFinished, before you null the cur_re
structure for GCing, it checks a flag to see if it should be
substituting.  The whole thing would go something like this:

reSubst RE_0, "matchagainst", "subststring"

RE_0:
        #do whatever
        reFinished              #does the substituting

Much of the idea of this design is that it likes qr//ed regular
expressions just fine--they're just ones you have to jump to instead of
branching.  That means that substitution regular expressions shouldn't
call additional ops--they should just have the logic tucked away
somewhere.

# Back to backracking:   :)
#
# >     #/xa*.b*[xb]/
# >     branch $start
# >$advance:
# >     reAdvance $fail #no longer using stack in this example
# >$start:
# >     reLiteral "x", $advance
# >$finda:
# >     reLiteral "a", $findany
# >     rePushindex
# >     branch $finda
# >$findany:
# >     reAnything $backa
# >     rePushmark
# >$findb:
# >     reLiteral "b", $findxb
# >     rePushindex
# >     branch $_findb
# >$findxb
# >     reOneof "xb", $backb
# >     set I0, 1
# >     reFinished
# >$backb:                              ## two backtracks
# >     rePopindex $backa
# >     branch $findxb
# >$backa:
# >     rePopindex $advance
# >     branch $findany
# >$fail:
# >     set I0, 0
# >     reFinished
#
#
# Yup, but you could this with the the alternative model too:
#
#       #/xa*.b*[xb]/
#       reOnFail $fail
#       branch $start
# $start:
#       rePushindex $advance
#       reLiteral "x"
# $finda:
#       rePushindex $findany
#       reLiteral "a"
#       branch $finda
# $findany:
#       reAnything
# $findb:
#       rePushindex $findxb
I'll assume that this  ^ was a mistake in your logic or typing.

#       reLiteral "b"
#       branch $findb
# $findxb:
#       reOneof "xb"
#       set I0, 1
#       reFinished
# $fail:
#       set I0, 0
#       reFinished
# $advance:
#       reAdvance
#       branch $start
#
# I have adapted the little compiler to output code for your
# ops proposal
# including the modification I suggest. This way we can see how
# the assembler
# would be for complex regular expressions. I am attaching it with this
# e-mail.
#
# Now it supports:
#
#  * literals
#  * nested groups (do not capture)
#  * alternation
#  * classes
#  * .
#  * simple quantifiers: *, +, ?
#
# I am working in merging your patch with my proposal of implicit
# backtracking, so the output of this compiler can actually
# run. Not much job
# but don't have time now. Hope I can send it in a day or so.

I look forward to seeing it.

--Brent Dax
[EMAIL PROTECTED]
Configure pumpking for Perl 6

When I take action, I’m not going to fire a $2 million missile at a $10
empty tent and hit a camel in the butt.
    --Dubya

Reply via email to