On Nov 2, 5:47 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > > Pyparsing is no recursive descent parser. It doesn't go back in the input > stream. The ``OneOrMore(Word(alphas))`` part "eats" the 'end' and when it > can't get more, the parser moves to the ``Literal('end')`` part which > fails because the 'end' is already gone. > > > Is there a way to get pyparsing to parse a grammar like this? > > Negative lookahead maybe: > > grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas)) > + Literal('end')) > > Ciao, > Marc 'BlackJack' Rintsch- Hide quoted text - > > - Show quoted text -
Well I'll be darned! All this time, I thought "recursive descent" described the recursive behavior of the parser, which pyparsing definitely has. I never knew that backing up in the face of parse mismatches was a required part of the picture. In pyparsing, each construction gets composed from more fine-grained constructions, and they are called recursively to match or not match against the input string. For example, taking your working parser example: grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas)) + Literal('end')) This creates the following data structure: - And - OneOrMore - And - NotAny - Literal('end') - Word(alphas) - Literal('end') Every instance in this structure derives from the base class ParserElement, which defines a method parse(). parse() in turn calls preParse(), parseImpl(), and postParse(). If parseImpl succeeds, it returns a ParseResults object and the next location within the input string to continue parsing. The parseImpl methods are most often overridden in the subclasses (a few override postParse as well), such as: - And.parseImpl invokes parse() (note the recursion) on each of the expressions in its list. All must succeed or And.parseImpl throws an exception. - OneOrMore.parseImpl invokes parse() on its contained expression, which must succeed at least once; if not, the exception is rethrown. If the contained expression succeeds once, then its parse() method is called again and again until it fails, but no exceptions are rethrown, since only one match was actually required. - NotAny inverts the success/failure of its contained expression. If the expression's parse() method succeeds, NotAny.parseImpl throws an exception. If the contained expression's parse() method throws an exception, NotAny returns successfully. (NotAny does not advance the parse location, nor does it return any tokens.) - Literal and Word are terminals, in that they do not invoke any other expression's parse() method. Literal.parseImpl tests whether its string exists at the current parse location, and throws an exception if it doesn't. Word.parseImpl tests whether the current parse location contains a letter in the Word instance's set of valid initial characters - if so success; if not, throws an exception. It then advances through the input string, matching characters in the Word instance's set of valid body characters. The entire matched string is then returned, along with an updated string index at which to continue parsing. In my concept of "recursive descent" parsing, I was under the impression that pyparsing's use of this data structure, and *recursive* calls of parse() as it *descends* through the data structure, constituted a recursive descent parser. What the OP requested was a more regular expression-ish matcher, with backup (or backtracking). Your example did not include either of the alternation classes, MatchFirst or Or. These classes implement a form of backtracking on the stack, that if we descend into an expression on the list of alternatives, and that expression fails, then MatchFirst or Or will back up to the same starting location and try the next alternative. (MatchFirst is an "eager" matcher, in that it will pick the first matching expression in its list - Or is an "exhaustive" matcher, in that it will evaluate all of the alternatives, and select the longest match.) The Wikipedia entry for "Recursive Descent Parser" describes this parser model as a "predictive parser", and later goes on to say that some (uncited) authors equate "predictive parser" with "recursive descent parsers". The article makes a special distinction for a "recursive parser with backup", which is what I believe the OP was asking for. Yes, pyparsing is definitely *not* a "recursive descent parser with backup." The solution, as you correctly posted, is to add a negative lookahead. NotAny is also represented using the '~' operator. By the way, there are a couple of other places where pyparsing diverges from the standard model: - implicit whitespace skipping - user-definable comment skipping - attachment of parse actions to expressions These features, while atypical, provide some added capability and ease- of-use in creating quick and simple parsers. So I will take my stance with the uncited authors who lump predictive parsers in with recursive descent parsers. -- Paul -- http://mail.python.org/mailman/listinfo/python-list