Hi,
On 05/11/2016 07:45 AM, Richard Hainsworth wrote:
I have the following in a grammar
rule TOP { ^ <statement>+ $ };
rule statement { <id> '=' <endvalue>
| { { self.panic($/, "Declaration syntax
incorrect") } }
};
rule endvalue { <keyword> '(' ~ ')' <pairlist>
| { self.panic($/, "Invalid declaration.") }
}
The grammar parses a correct input file up until the end of the file. At
that point even if there is no un-consumed input, there is an attempt to
match <id>, which fails. The failure causes the panic with 'Declaration
syntax'.
Am I missing something simple here?
I would have thought (though this is only a very newbie assumption)
that if the end of the input being sent to the grammar has been reached
after the last <statement> has been matched, then there should be no
reason for the parse method to try to match <statement> again, and if it
fails to test for the end of input.
This is not how regexes or grammars work.
The + quantifier tries as many times as possible to match the regex. It
doesn't look ahead to see if more characters are available, and it
doesn't know about the end-of-string anchor that comes next in the grammar.
In fact, it doesn't know if the rule it quantifies might have a way to
match zero characters. In this case, it would be wrong behavior to not
do a zero-width at the end of the string.
As for improving the error reporting from within a grammar, there are
lots of way to get creative, and I'd urge you to read Perl 6's own
grammar, which is a good inspiration for that.
See https://github.com/rakudo/rakudo/blob/nom/src/Perl6/Grammar.nqp
One thing you could do is structure the statement rule differently:
rule statement {
<id>
[ '=' <endvalue>
|| { self.panic($/, "Invalid declaration.")
]
}
And maybe also TOP:
rule TOP { ^ [ <statement> || . { self.panic($/, "Expected a
statement") } ] $ };
That extra dot before the panic ensures it's not called at the end of
the string. If you don't want that, you could also do
[ <statement> || $ || { self.panic(...) } ]
Cheers,
Moritz