Brent Dax :
> Okay, this bunch of ops is a serious attempt at regular expressions. I
> had a discussion with japhy on this in the Monastery
> (http://www.perlmonks.org/index.pl?node_id=122784), and I've come up
> with something flexible enough to actually (maybe) work. Attached is a
> patch to modify core.ops and add re.h (defines data structures and such)
> and t/op/re.t (six tests). All tests, including the new ones, pass.
Hi Brent,
Since your ops are much complete and better documented that the ones I sent,
I was trying to adapt my previous regex compiler to your ops, but I found
what i think might be a limitation of your model.
It looks to me that for compiling down regexp to usual opcodes there is the
need of having a generic backtrack, insted of a $backtrack label for each
case.
I have been uncapable of expressing nested groups or alternation with your
model, and I would say that this is because the engine needs some way to save
not only the index into the string, but also the point of the regex where it
can branch on a backtack.
You solve this in your examples, by having a "$bactrack" address for each
case, but this looks to me as a bad solution. In particular, i would say that
cannot be aplied for complex regular expressions.
In my previous experimental patch, there was a way to save the string index
_plus_ the "regex index". Writing this with your syntax, it would mean to be
able to add a parametrer in rePushindex that saves the "regex index".
Your example:
RE:
reFlags ""
reMinlength 4
$advance:
rePopindex
reAdvance $fail
$start:
rePushindex
reLiteral "f", $advance
$findo:
literal "o", $findbar
rePushindex
branch $findo
$findbar:
reLiteral "bar", $backtrack
set I0, 1 #true
reFinished
$backtrack:
rePopindex $advance
branch $findbar <<<<<<< backtrack needs to know where to branch
$fail:
set I0, 0 #false
reFinished
Your example tweaked by me:
RE:
reFlags ""
reOnFail $fail
reMinlength 4
$start:
rePushindex $advance
reLiteral "f"
$findo:
rePushindex $findbar
literal "o"
branch $findo
$findbar:
reLiteral "bar"
set I0, 1 #true
reFinished
$fail:
set I0, 0 #false
reFinished
$advance:
reAdvance
branch $start
So it is not the reLiteral, reAdvance, etc.. ops that need to know were they
have to branch on failing, but when failing they always:
-pop the last index on the stack and then branch to the last saved
destination.
-or branch to the address previously set in reOnFail op if there are no
pending indexes.
There is no $bactrack label, but the backtracking action is called each time
a submatch fails.
I am not sure that this is the only solution, but is the one that come to my
mind mind seeing your proposal and I find it quite elegant.
It is quite possible that nested groups and alternation can be implemented
with your model. If that is the case, �could you please post an example so I
can understand?.
What do you think about it?
-angel