Hi Brent, ># I have been uncapable of expressing nested groups or ># alternation with your model, and I would say that this ># is because the engine needs some way to save not only ># the index into the string, but also the point of the ># regex where it can branch on a backtack.
>I've been a bit worried that this might be the case. The best solution >I've been able to think of is to push a "mark" onto the stack, like what >Perl 5 does with its call stack to indicate the end of the current >function's arguments. If a call to rePopindex popped a mark, it would >be considered to have failed, so it would branch to $1 if it had a >parameter. I am not sure I understand it fully. If I get it right an action can branch if: a) A submatch fails. b) A popIndex fails, because there is a mark set It looks to me a bit too confusing, both from the developer and the compiler point of view. Maybe is just a matter of taste, but isn't it going to be more complex for the compiler?... > .. a larger example .. > reLiteral "b" #and what if this fails? It just branches to $fail, because that's the address that was set in OnFail. >However, this may not be a good example, as I'm seriously looking at the >possibility of making reAdvance independent of the stack >(cur_re->startindex or something) to ease implementation of reSubst >(substitution) and related nonsense. Here's a better example: > I was thinking in a different way of implementing substitution: On the cur_re struct you would you store all the info about the match: index of the first letter of the match, index of the last letter of the match, info about the position of the capturing groups, etc.. Then, after the regexp has matched, you simply read this structure and apply the substitution to the string. >From the engine, the only thing that you need to do is have a pair of ops that store the indexes. Something like this: reInitGroup i Saves the current index position as the start position of the i group. reEndGroup i Saves the current index position as the end position of the i group. The whole match would be capturing group 0. So a full match with substitution looks like: RE: reOnFail $fail rePushIndex $advance reInitGroup 0 reLiteral "literal" reEndGroup 0 set I0, 1 reSubst "substitution", S1 ## probably not here, but on the caller. reFinished $advance: reAdvance $fail: set I0, 0 reFinished Back to backracking: :) > #/xa*.b*[xb]/ > branch $start >$advance: > reAdvance $fail #no longer using stack in this example >$start: > reLiteral "x", $advance >$finda: > reLiteral "a", $findany > rePushindex > branch $finda >$findany: > reAnything $backa > rePushmark >$findb: > reLiteral "b", $findxb > rePushindex > branch $_findb >$findxb > reOneof "xb", $backb > set I0, 1 > reFinished >$backb: ## two backtracks > rePopindex $backa > branch $findxb >$backa: > rePopindex $advance > branch $findany >$fail: > set I0, 0 > reFinished Yup, but you could this with the the alternative model too: #/xa*.b*[xb]/ reOnFail $fail branch $start $start: rePushindex $advance reLiteral "x" $finda: rePushindex $findany reLiteral "a" branch $finda $findany: reAnything $findb: rePushindex $findxb reLiteral "b" branch $findb $findxb: reOneof "xb" set I0, 1 reFinished $fail: set I0, 0 reFinished $advance: reAdvance branch $start I have adapted the little compiler to output code for your ops proposal including the modification I suggest. This way we can see how the assembler would be for complex regular expressions. I am attaching it with this e-mail. Now it supports: * literals * nested groups (do not capture) * alternation * classes * . * simple quantifiers: *, +, ? I am working in merging your patch with my proposal of implicit backtracking, so the output of this compiler can actually run. Not much job but don't have time now. Hope I can send it in a day or so. -angel
BabyRegex.pm
Description: Binary data
regexc.pl
Description: Binary data