Paul McGuire wrote: > On Nov 8, 3:14 am, Donn Ingle <[EMAIL PROTECTED]> wrote: > > >> float = nums + dot + nums >> > > Should be: > > float = Combine(Word(nums) + dot + Word(nums)) > > nums is a string that defines the set of numeric digits for composing > Word instances. nums is not an expression by itself. > > For that matter, I see in your later tests that some values have a > leading minus sign, so you should really go with: > > float = Combine(Optional("-") + Word(nums) + dot + Word(nums)) > > >
I have a working path data parser (in pyparsing) at http://code.google.com/p/wxpsvg. Parsing the numeric values initially gave me a lot of trouble - I translated the BNF in the spec literally and there was a *ton* of backtracking going on with every numeric value. I ended up using a more generous grammar, and letting pythons float() reject invalid values. I couldn't get repeating path elements (like M 100 100 200 200, which is the same as M 100 100 M 200 200) working right in the grammar, so I expand those with post-processing. The parser itself can be seen at http://wxpsvg.googlecode.com/svn/trunk/svg/pathdata.py > Some other comments: > > 1. Read up on the Word class, you are not using it quite right. > > command = Word("MLCZ") > > will work with your test set, but it is not the form I would choose. > Word(characterstring) will match any "word" made up of the characters > in the input string. So Word("MLCZ") will match > M > L > C > Z > MM > LC > MCZL > MMLCLLZCZLLM > > I would suggest instead using: > > command = Literal("M") | "L" | "C" | "Z" > > or > > command = oneOf("M L C Z") > > 2. Change comma to > > comma = Literal(",").suppress() > > The comma is important to the parsing process, but the ',' token is > not much use in the returned set of matched tokens, get rid of it (by > using suppress). > > 3. Group your expressions, such as > > couple = Group(float + comma + float) > > It will really simplify getting at the resulting parsed tokens. > > > 4. What is the purpose of (couple + couple)? This is sufficient: > > phrase = OneOrMore(command + Group(OneOrMore(couple)) ) > > (Note use of Group to return the coord pairs as a sublist.) > > > 5. Results names! > > phrase = OneOrMore(command("command") + Group(OneOrMore(couple)) > ("coords") ) > > will allow you to access these fields by name instead of by index. > This will make your parser code *way* more readable. > > > -- Paul > > -- http://mail.python.org/mailman/listinfo/python-list