Angel Faus: # ># I have been uncapable of expressing nested groups or # ># alternation with your model, and I would say that this # ># is because the engine needs some way to save not only # ># the index into the string, but also the point of the # ># regex where it can branch on a backtack. # # >I've been a bit worried that this might be the case. The # best solution # >I've been able to think of is to push a "mark" onto the # stack, like what # >Perl 5 does with its call stack to indicate the end of the current # >function's arguments. If a call to rePopindex popped a # mark, it would # >be considered to have failed, so it would branch to $1 if it had a # >parameter. # # I am not sure I understand it fully. If I get it right an # action can branch # if: # # a) A submatch fails. # b) A popIndex fails, because there is a mark set
An action can branch if it fails. For most ops, 'fail' means that it didn't match; for rePopindex, 'fail' means that it popped a mark instead of an index. rePopindex already has a failure mode: if there isn't anything left on the stack to pop, rePopindex jumps to its parameter if it has one. I'm just changing that failure mode. # It looks to me a bit too confusing, both from the developer # and the compiler # point of view. Maybe is just a matter of taste, but isn't it # going to be # more complex for the compiler?... It just means you have to be more explicit. I consider that a Good Thing--Perl 5's regular expressions are compact enough to be represented like: 1: EXACT <f>(3) 3: STAR(6) 4: EXACT <o>(0) 6: EXACT <bar>(8) 8: END(0) but the internals to support them are an absolute jungle. I'd rather have a few exposed ops than have a chunk of code like Perl 5's regular expressions. Besides, being able to /see/ where we jump to on each op when we fail will help when debugging the RE compiler and such. How do you plan to support lookahead? Unless I'm mistaken, with my proposal it goes something like: rePushindex rePushmark #lookahead code in here rePopmark rePopindex There are advantages and disadvantages to both proposals. The question is, which one (or both) is flexible enough to do what we want it to do? # > .. a larger example .. # > reLiteral "b" #and what if this fails? # # It just branches to $fail, because that's the address that was set in # OnFail. But I pushed $continue onto the stack just a few lines up... # >However, this may not be a good example, as I'm seriously # looking at the # >possibility of making reAdvance independent of the stack # >(cur_re->startindex or something) to ease implementation of reSubst # >(substitution) and related nonsense. Here's a better example: # > # # I was thinking in a different way of implementing substitution: # # On the cur_re struct you would you store all the info about the match: # index of the first letter of the match, index of the last # letter of the # match, info about the position of the capturing groups, etc.. # # Then, after the regexp has matched, you simply read this # structure and apply # the substitution to the string. # # From the engine, the only thing that you need to do is have a # pair of ops # that store the indexes. Something like this: # # reInitGroup i # Saves the current index position as the start position of # the i group. # reEndGroup i # Saves the current index position as the end position of # the i group. # # The whole match would be capturing group 0. # # So a full match with substitution looks like: # # RE: # reOnFail $fail # rePushIndex $advance # reInitGroup 0 # reLiteral "literal" # reEndGroup 0 # set I0, 1 # reSubst "substitution", S1 ## probably not here, but on # the caller. # reFinished # $advance: # reAdvance # $fail: # set I0, 0 # reFinished I was thinking of extending the re_info struct to include another string field. Then, when you hit reFinished, before you null the cur_re structure for GCing, it checks a flag to see if it should be substituting. The whole thing would go something like this: reSubst RE_0, "matchagainst", "subststring" RE_0: #do whatever reFinished #does the substituting Much of the idea of this design is that it likes qr//ed regular expressions just fine--they're just ones you have to jump to instead of branching. That means that substitution regular expressions shouldn't call additional ops--they should just have the logic tucked away somewhere. # Back to backracking: :) # # > #/xa*.b*[xb]/ # > branch $start # >$advance: # > reAdvance $fail #no longer using stack in this example # >$start: # > reLiteral "x", $advance # >$finda: # > reLiteral "a", $findany # > rePushindex # > branch $finda # >$findany: # > reAnything $backa # > rePushmark # >$findb: # > reLiteral "b", $findxb # > rePushindex # > branch $_findb # >$findxb # > reOneof "xb", $backb # > set I0, 1 # > reFinished # >$backb: ## two backtracks # > rePopindex $backa # > branch $findxb # >$backa: # > rePopindex $advance # > branch $findany # >$fail: # > set I0, 0 # > reFinished # # # Yup, but you could this with the the alternative model too: # # #/xa*.b*[xb]/ # reOnFail $fail # branch $start # $start: # rePushindex $advance # reLiteral "x" # $finda: # rePushindex $findany # reLiteral "a" # branch $finda # $findany: # reAnything # $findb: # rePushindex $findxb I'll assume that this ^ was a mistake in your logic or typing. # reLiteral "b" # branch $findb # $findxb: # reOneof "xb" # set I0, 1 # reFinished # $fail: # set I0, 0 # reFinished # $advance: # reAdvance # branch $start # # I have adapted the little compiler to output code for your # ops proposal # including the modification I suggest. This way we can see how # the assembler # would be for complex regular expressions. I am attaching it with this # e-mail. # # Now it supports: # # * literals # * nested groups (do not capture) # * alternation # * classes # * . # * simple quantifiers: *, +, ? # # I am working in merging your patch with my proposal of implicit # backtracking, so the output of this compiler can actually # run. Not much job # but don't have time now. Hope I can send it in a day or so. I look forward to seeing it. --Brent Dax [EMAIL PROTECTED] Configure pumpking for Perl 6 When I take action, I’m not going to fire a $2 million missile at a $10 empty tent and hit a camel in the butt. --Dubya