Brent Dax :

> Okay, this bunch of ops is a serious attempt at regular expressions.  I
> had a discussion with japhy on this in the Monastery
> (http://www.perlmonks.org/index.pl?node_id=122784), and I've come up
> with something flexible enough to actually (maybe) work.  Attached is a
> patch to modify core.ops and add re.h (defines data structures and such)
> and t/op/re.t (six tests).  All tests, including the new ones, pass.

Hi Brent,

Since your ops are much complete and better documented that the ones I sent, 
I was trying to adapt my previous regex compiler to your ops, but I found 
what i think might be a limitation of your model.

It looks to me that for compiling down regexp to usual opcodes there is the 
need of having a generic backtrack, insted of a $backtrack label for each 
case.

I have been uncapable of expressing nested groups or alternation with your 
model, and I would say that this is because the engine needs some way to save 
not only the index into the string, but also the point of the regex where it 
can branch on a backtack.

You solve this in your examples, by having a "$bactrack" address for each 
case, but this looks to me as a bad solution. In particular, i would say that 
cannot be aplied for complex regular expressions.

In my previous experimental patch, there was a way to save the string index 
_plus_ the "regex index". Writing this with your syntax, it would mean to be 
able to add a parametrer in rePushindex that saves the "regex index". 

Your example:

RE:
        reFlags ""
        reMinlength 4
$advance:
        rePopindex
        reAdvance $fail
$start:
        rePushindex
        reLiteral "f", $advance
$findo:
        literal "o", $findbar
        rePushindex
        branch $findo
$findbar:
        reLiteral "bar", $backtrack
        set I0, 1       #true
        reFinished
$backtrack:
        rePopindex $advance
        branch $findbar     <<<<<<< backtrack needs to know where to branch
$fail:
        set I0, 0       #false
        reFinished

Your example tweaked by me:

RE:
        reFlags ""
        reOnFail $fail
        reMinlength 4
$start:
        rePushindex $advance
        reLiteral "f"
$findo:
        rePushindex $findbar
        literal "o"
        branch $findo
$findbar:
        reLiteral "bar"
        set I0, 1       #true
        reFinished
$fail:
        set I0, 0       #false
        reFinished
$advance:
        reAdvance
        branch $start

So it is not the reLiteral, reAdvance, etc.. ops that need to know were they 
have to branch on failing, but when failing they always:

  -pop the last index on the stack and then branch to the last saved 
destination.
  -or branch to the address previously set in reOnFail op if there are no 
pending indexes.

There is no $bactrack label, but the backtracking action is called each time 
a submatch fails.

I am not sure that this is the only solution, but is the one that come to my 
mind mind seeing your proposal and I find it quite elegant. 

It is quite possible that nested groups and alternation can be implemented 
with your model. If that is the case, ¿could you please post an example so I 
can understand?.

What do you think about it?

-angel

Reply via email to