Interesting. Thanks Paul and Tim. This looks very promising. Ryan
On Nov 28, 2007 1:23 PM, Paul McGuire <[EMAIL PROTECTED]> wrote: > On Nov 28, 11:32 am, "Ryan Krauss" <[EMAIL PROTECTED]> wrote: > > I need to parse the following string: > > > > $$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=\pmatrix{\left({{{\it m_2}\,s^2 > > }\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F > > }\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1 > > \right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$ > > > > The first thing I need to do is extract the arguments to \pmatrix{ } > > on both the left and right hand sides of the equal sign, so that the > > first argument is extracted as > > > > {\it x_2}\cr 0\cr 1\cr > > > > and the second is > > > > \left({{{\it m_2}\,s^2 > > }\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F > > }\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1 > > \right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr > > > > The trick is that there are extra curly braces inside the \pmatrix{ } > > strings and I don't know how to write a regexp that would count the > > number of open and close curly braces and make sure they match, so > > that it can find the correct ending curly brace. > > > > As Tim Grove points out, writing a grammar for this expression is > really pretty simple, especially using the latest version of > pyparsing, which includes a new helper method, nestedExpr. Here is > the whole program to parse your example: > > from pyparsing import * > > data = r"""$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }= > \pmatrix{\left({{{\it m_2}\,s^2 > }\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it > m_2}\,s^2\,F > }\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it > m_2}\,s^2}\over{k}}+1 > \right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$""" > > PMATRIX = Literal(r"\pmatrix") > nestedBraces = nestedExpr("{","}") > grammar = "$$" + PMATRIX + nestedBraces + "=" + \ > PMATRIX + nestedBraces + \ > "$$" > res = grammar.parseString(data) > print res > > This prints the following: > > ['$$', '\\pmatrix', [['\\it', 'x_2'], '\\cr', '0\\cr', '1\\cr'], '=', > '\\pmatrix', ['\\left(', [[['\\it', 'm_2'], '\\,s^2'], '\\over', > ['k']], '+1\\right)\\,', ['\\it', 'x_1'], '-', [['F'], '\\over', > ['k']], '\\cr', '-', [[['\\it', 'm_2'], '\\,s^2\\,F'], '\\over', > ['k']], '-F+\\left(', ['\\it', 'm_2'], '\\,s^2\\,\\left(', [[['\\it', > 'm_2'], '\\,s^2'], '\\over', ['k']], '+1', '\\right)+', ['\\it', > 'm_2'], '\\,s^2\\right)\\,', ['\\it', 'x_1'], '\\cr', '1\\cr'], '$$'] > > Okay, maybe this looks a bit messy. But believe it or not, the > returned results give you access to each grammar element as: > > ['$$', '\\pmatrix', [nested arg list], '=', '\\pmatrix', > [nestedArgList], '$$'] > > Not only has the parser handled the {} nesting levels, but it has > structured the returned tokens according to that nesting. (The '{}'s > are gone now, since their delimiting function has been replaced by the > nesting hierarchy in the results.) > > You could use tuple assignment to get at the individual fields: > dummy,dummy,lhs_args,dummy,dummy,rhs_args,dummy = res > > Or you could access the fields in res using list indexing: > lhs_args, rhs_args = res[2],res[5] > > But both of these methods will break if you decide to extend the > grammar with additional or optional fields. > > A safer approach is to give the grammar elements results names, as in > this slightly modified version of grammar: > > grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \ > PMATRIX + nestedBraces("rhs_args") + \ > "$$" > > Now you can access the parsed fields as if the results were a dict > with keys "lhs_args" and "rhs_args", or as an object with attributes > named "lhs_args" and "rhs_args": > > res = grammar.parseString(data) > print res["lhs_args"] > print res["rhs_args"] > print res.lhs_args > print res.rhs_args > > Note that the default behavior of nestedExpr is to give back a nested > list of the elements according to how the original text was nested > within braces. > > If you just want the original text, add a parse action to nestedBraces > to do this for you (keepOriginalText is another pyparsing builtin). > The parse action is executed at parse time so that there is no post- > processing needed after the parsed results are returned: > > nestedBraces.setParseAction(keepOriginalText) > grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \ > PMATRIX + nestedBraces("rhs_args") + \ > "$$" > > res = grammar.parseString(data) > print res > print res.lhs_args > print res.rhs_args > > Now this program returns the original text for the nested brace > expressions: > > ['$$', '\\pmatrix', '{{\\it x_2}\\cr 0\\cr 1\\cr }', '=', '\\pmatrix', > '{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}\ > \over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it > m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it > m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }', '$$'] > ['{{\\it x_2}\\cr 0\\cr 1\\cr }'] > ['{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F} > \\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it > m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it > m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }'] > > You can find more info on pyparsing at http://pyparsing.wikispaces.com. > > Cheers! > -- Paul > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list