On Sat, Aug 29, 2009 at 02:45:08PM -0700, Gilbert R. Roehrbein (via RT) wrote: > BUG: > $ perl6 > > say '(foo)' ~~ /'(' ~ ')' .*?/ > Unable to parse , couldn't find final ')' > in regex PGE::Grammar::_block21 (<unknown>:1) > called from Main (<unknown>:1) > > TEST: > ok( '(foo)' ~~ /'(' ~ ')' .*?/ ) > > The problem is .*? , i think, cause '(foo)' ~~ /'(' ~ ')' 'foo'/ > matches.
Currently Synopsis 5 is a bit unclear on the handling of backtracking using the ~ operator in regexes. The current definition says that something of the form '(' ~ ')' <expression> gets rewritten to be something like '(' <expression> [ ')' || <FAILGOAL> ] Note that there's no way to backtrack into <expression> -- once we've matched <expression>, we either find the closing token or we fail. So in the case of the problem regex above, we end up with '(' .*? [ ')' || <FAILGOAL> ] which will match only "()", because there's no possibility of backtracking into the .*? to find longer strings between the parens. At one time I tried changing the definition of ~ so that it could allow backtracking into the expression '(' [ <expression> ')' || <FAILGOAL> ] but ISTR that I ran into some other issues there and gave up for the time being. So, short answer is that I think Rakudo is correctly following the specification here, but we may need to tweak the specification a bit. > The bug was introduced between June 30 and today, cause similar > code is used in http://github.com/krunen/xml/tree/master, last updated > June 30 AFAIK none of the related code has been changed between June 30 and today, so I'm guessing something else must be happening there. Looking at the grammar that is given at that address now, I see token comment { '<!--' ~ '-->' <content> } token pi { '<?' ~ '?>' <content> } token content { .*? } Given that these are all "token" (no backtracking), that would mean that the calls to the <content> subrule will only ever match an empty string. Thanks! Pm