On Jun 3, 8:43 am, "Filipe Fernandes" <[EMAIL PROTECTED]> wrote: > > I've briefly looked at PLY and pyparsing. There are several others, > but too many to enumerate. My understanding is that PLY (although > more difficult to use) has much more flexibility than pyparsing. I'm > basically looking to make an informed choice. Not just for this > project, but for the long haul. I'm not afraid of using a difficult > (to use or learn) parser either if it buys me something like > portability (with other languages) or flexibility). >
Short answer: try them both. Learning curve on pyparsing is about a day, maybe two. And if you are already familiar with regex, PLY should not seem too much of a stretch. PLY parsers will probably be faster running than pyparsing parsers, but I think pyparsing parsers will be quicker to work up and get running. Longer answer: PLY is of the lex/yacc school of parsing libraries (PLY=Python Lex/Yacc). Use regular expressions to define terminal token specifications (a la lex). Then use "t_XXX" and "p_XXX" methods to build up the parsing logic - docstrings in these methods capture regex or BNF grammar definitions. In contrast, pyparsing is of the combinator school of parsers. Within your Python code, you compose your parser using '+' and '|' operations, building up the parser using pyparsing classes such as Literal, Word, OneOrMore, Group, etc. Also, pyparsing is 100% Python, so you wont have any portability issues (don't know about PLY). Here is a link to a page with a PLY and pyparsing example (although not strictly a side-by-side comparison): http://www.rexx.com/~dkuhlman/python_201/. For comparison, here is a pyparsing version of the PLY parser on that page (this is a recursive grammar, not necessarily a good beginner's example for pyparsing): =============== term = Word(alphas,alphanums) func_call = Forward() func_call_list = Forward() comma = Literal(",").suppress() func_call_list << Group( func_call + Optional(comma + func_call_list) ) lpar = Literal("(").suppress() rpar = Literal(")").suppress() func_call << Group( term + lpar + Optional(func_call_list,default=[""]) + rpar ) command = func_call prog = OneOrMore(command) comment = "#" + restOfLine prog.ignore( comment ) ================ With the data set given at Dave Kuhlman's web page, here is the output: [['aaa', ['']], ['bbb', [['ccc', ['']]]], ['ddd', [['eee', ['']], [['fff', [['ggg', ['']], [['hhh', ['']], [['iii', ['']]]]]]]]]] Pyparsing makes some judicious assumptions about how you will want to parse, most significant being that whitespace can be ignored during parsing (this *can* be overridden in the parser definition). Pyparsing also supports token grouping (for building parse trees), parse-time callbacks (called 'parse actions'), and assigning names within subexpressions (called 'results names'), which really helps in working with the tokens returned from the parsing process. If you learn both, you may find that pyparsing is a good way to quickly prototype a particular parsing problem, which you can then convert to PLY for performance if necessary. The pyparsing prototype will be an efficient way to work out what the grammar "kinks" are, so that when you get around to PLY-ifying it, you already have a clear picture of what the parser needs to do. But, really, "more flexible"? I wouldn't really say that was the big difference between the two. Cheers, -- Paul (More pyparsing info at http://pyparsing.wikispaces.com.) -- http://mail.python.org/mailman/listinfo/python-list