"Neil Cerutti" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On 2007-11-04, Just Another Victim of the Ambient Morality > <[EMAIL PROTECTED]> wrote: >>> Consider writing a recursive decent parser by hand to parse >>> the language '[ab]+b'. >>> >>> goal --> ab_list 'b' >>> ab_list --> 'a' list_tail >>> ab_list --> 'b' list_tail >>> list_tail --> 'a' list_tail >>> list_tail --> 'b' list_tail >>> list_tail --> null >>> >>> >>> The above has the exact same bug (and probably some others--I'm >>> sorry unable to test it just now) as the PyParsing solution. >>> >>> The error is in the grammar. It might be fixed by specifying that >>> 'b' must be followed by EOF, and then it could be coded by using >>> more than one character of lookahead. >> >> I don't exactly understand the syntax you used to describe the >> productions of your recursive descent parser so not only did I not follow >> it >> but I couldn't make out the rest of your post. Could you explain in a >> little more detail? The last part that points to 'null' is especially >> confusing... > > It's the BNF spelling of > > goal --> ab_list 'b' > ab_list --> ab { ab } > ab --> 'a' | 'b' > > The null is to say that list_tail can match nothing, i.e, an > empty string. > > Then, in the Parser class, every method (except for match, which > is used as a central place to consume characters) corresponds to > one of the productions in the BNF. Breaking things down into > BNF-based productions often makes implementation, debugging and > code generation easier. > > PyParsing saves me that stop, since I can often directly > implement the EBNF using PyParsing.
Okay, I see that now, thank you. Your statement from the previous post: >> Consider writing a recursive decent parser by hand to parse >> the language '[ab]+b'. >> >> goal --> ab_list 'b' >> ab_list --> 'a' list_tail >> ab_list --> 'b' list_tail >> list_tail --> 'a' list_tail >> list_tail --> 'b' list_tail >> list_tail --> null >> >> >> The above has the exact same bug (and probably some others--I'm >> sorry unable to test it just now) as the PyParsing solution. ...merely demonstrates that this grammar is similarly ambiguous. There are many ways to parse this correctly and pyparsing chooses none of these! Instead, it returns the same error it does when the string has no solutions... >> As demonstrated earlier, it's not just the grammar. There are >> situations that are unambiguous that pyparsing can't parse >> simply and there's no reason for it. > > Yes, many parser generators have many more limitations than just > the requirement of an unambiguous grammar. Yes, but a recursive descent parser? I expect such things from LALR and others, but not only do I expect a recursive descent parser to correctly parse grammars but I expect it to even parse ambiguous ones, in that it is the only technique prepared to find more than one solution... >> Besides, ambiguous grammars are a fact of life and some of us >> need to parse them. It's usually okay, too. Consider a >> previous example: >> >> grammar = OneOrMore(Word(alphas)) + Literal('end') >> >> While you may consider this inherently ambiguous, it's usually >> not. That is to say, as long as it is rare that 'end' is used >> not at the end of the string, this will simply parse and, yet, >> pyparsing will consistently fail to parse it... > > I believe there's no cure for the confusion you're having except > for implementing a parser for your proposed grammar. > Alternatively, try implementing your grammar in one of your other > favorite parser generators. I believe there is a cure and it's called recursive descent parsing. It's slow, obviously, but it's correct and, sometimes (arguably, often), that's more important the execution speed. I spent this morning whipping up a proof of concept parser whose interface greatly resembles pyparsing but, baring unknown bugs, works and works as I'd expect a recursive descent parser to work. I don't know Python very well so the parser is pretty simple. It only lexes single characters as tokens. It only supports And, Or, Optional, OneOrMore and ZeroOrMore rules but I already think this is a rich set of rules. I'm sure others can be added. Finally, I'm not sure it's safely copying all its parameter input the same way pyparsing does but surely those bugs can be worked out. It's merely a proof of concept to demonstrate a point. Everyone, please look it over and tell me what you think. Unfortunately, my news client is kind of poor, so I can't simply cut and paste the code into here. All the tabs get turned into single spacing, so I will post this link, instead: http://theorem.ca/~dlkong/new_pyparsing.zip I hope you can all deal with .zip files. Let me know if this is a problem. Thank you... -- http://mail.python.org/mailman/listinfo/python-list