On Mar 23, 1:48 pm, rh0dium <[EMAIL PROTECTED]> wrote: > On Mar 23, 12:26 am, Paul McGuire <[EMAIL PROTECTED]> wrote: > > > > > There are a couple of bugs in our program so far. > > > First of all, our grammar isn't parsing the METAL2 entry at all. We > > should change this line: > > > md = mainDict.parseString(test1) > > > to > > > md = (mainDict+stringEnd).parseString(test1) > > > The parser is reading as far as it can, but then stopping once > > successful parsing is no longer possible. Since there is at least one > > valid entry matching the OneOrMore expression, then parseString raises > > no errors. By adding "+stringEnd" to our expression to be parsed, we > > are saying "once parsing is finished, we should be at the end of the > > input string". By making this change, we now get this parse > > exception: > > > pyparsing.ParseException: Expected stringEnd (at char 1948), (line:54, > > col:1) > > > So what is the matter with the METAL2 entries? After using brute > > force "divide and conquer" (I deleted half of the entries and got a > > successful parse, then restored half of the entries I removed, until I > > added back the entry that caused the parse to fail), I found these > > lines in the input: > > > fatTblThreshold = (0,0.39,10.005) > > fatTblParallelLength = (0,1,0) > > > Both of these violate the atflist definition, because they contain > > integers, not just floatnums. So we need to expand the definition of > > aftlist: > > > floatnum = Combine(Word(nums) + "." + Word(nums) + > > Optional('e'+oneOf("+ -")+Word(nums))) > > floatnum.setParseAction(lambda t:float(t[0])) > > integer = Word(nums).setParseAction(lambda t:int(t[0])) > > atflist = Suppress("(") + delimitedList(floatnum|integer) + \ > > Suppress(")") > > > Then we need to tackle the issue of adding nesting for those entries > > that have sub-keys. This is actually kind of tricky for your data > > example, because nesting within Dict expects input data to be nested. > > That is, nesting Dict's is normally done with data that is input like: > > > main > > Technology > > Layer > > PRBOUNDARY > > METAL2 > > Tile > > unit > > > But your data is structured slightly differently: > > > main > > Technology > > Layer PRBOUNDARY > > Layer METAL2 > > Tile unit > > > Because Layer is repeated, the second entry creates a new node named > > "Layer" at the second level, and the first "Layer" entry is lost. To > > fix this, we need to combine Layer and the layer id into a composite- > > type of key. I did this by using Group, and adding the Optional alias > > (which I see now is a poor name, "layerId" would be better) as a > > second element of the key: > > > mainDict = dictOf( > > Group(Word(alphas)+Optional(quotedString)), > > Suppress("{") + attrDict + Suppress("}") > > ) > > > But now if we parse the input with this mainDict, we see that the keys > > are no longer nice simple strings, but they are 1- or 2-element > > ParseResults objects. Here is what I get from the command "print > > md.keys()": > > > [(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer', > > 'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})] > > > So to finally clear this up, we need one more parse action, attached > > to the mainDict expression, that rearranges the subdicts using the > > elements in the keys. The parse action looks like this, and it will > > process the overall parse results for the entire data structure: > > > def rearrangeSubDicts(toks): > > # iterate over all key-value pairs in the dict > > for key,value in toks.items(): > > # key is of the form ['name'] or ['name', 'name2'] > > # and the value is the attrDict > > > # if key has just one element, use it to define > > # a simple string key > > if len(key)==1: > > toks[key[0]] = value > > else: > > # if the key has two elements, create a > > # subnode with the first element > > if key[0] not in toks: > > toks[key[0]] = ParseResults([]) > > > # add an entry for the second key element > > toks[key[0]][key[1]] = value > > > # now delete the original key that is the form > > # ['name'] or ['name', 'name2'] > > del toks[key] > > > It looks a bit messy, but the point is to modify the tokens in place, > > by rearranging the attrdicts to nodes with simple string keys, instead > > of keys nested in structures. > > > Lastly, we attach the parse action in the usual way: > > > mainDict.setParseAction(rearrangeSubDicts) > > > Now you can access the fields of the different layers as: > > > print md.Layer.METAL2.lineStyle > > > I guess this all looks pretty convoluted. You might be better off > > just doing your own Group'ing, and then navigating the nested lists to > > build your own dict or other data structure. > > > -- Paul > > Hi Paul, > > Before I continue this I must thank you for your help. You really did > do an outstanding job on this code and it is really straight forward > to use and learn from. This was a fun weekend task and I really > wanted to use pyparsing to do it. Because this is one of several type > of files I want to parse. I (as I'm sure you would agree) think the > rearrangeSubDicts is a bit of a hack but never the less absolutely > required and due to the limitations of the data I am parsing. Once > again thanks for your great help. Now the problem.. > > I attempted to use this code on another testcase. This testcase had > tabs in it. I think 1.4.11 is missing the expandtabs attribute. I > ran my code (which had tabs) and I got this.. > > AttributeError: 'builtin_function_or_method' object has no attribute > 'expandtabs' > > Ugh oh. Is this a pyparsing problem or am I just an idiot.. > > Thanks again!
Doh!! Nevermind I am an idiot. Nope I got it what a bonehead.. I needed to tweak it a bit to ignore the comments.. Namely this fixed it up.. mainDict = dictOf( Group(Word(alphas)+Optional(quotedString)), Suppress("{") + attrDict + Suppress("}") ) | cStyleComment.suppress() Thanks again. Now I just need to figure out how to use your dicts to do some work.. -- http://mail.python.org/mailman/listinfo/python-list