On Mar 23, 5:30 pm, "Daniel Nogradi" <[EMAIL PROTECTED]> wrote: > Hi list, > > I'm in a process of rewriting a bash/awk/sed script -- that grew to > big -- in python. I can rewrite it in a simple line-by-line way but > that results in ugly python code and I'm sure there is a simple > pythonic way. > > The bash script processed text files of the form... > > Any elegant solution for this?
Is a parser overkill? Here's how you might use pyparsing for this problem. I just wanted to show that pyparsing's returned results can be structured as more than just lists of tokens. Using pyparsing's Dict class (or the dictOf helper that simplifies using Dict), you can return results that can be accessed like a nested list, like a dict, or like an instance with named attributes (see the last line of the example). You can adjust the syntax definition of keys and values to fit your actual data, for instance, if the matrices are actually integers, then define the matrixRow as: matrixRow = Group( OneOrMore( Word(nums) ) ) + eol -- Paul from pyparsing import ParserElement, LineEnd, Word, alphas, alphanums, \ Group, ZeroOrMore, OneOrMore, Optional, dictOf data = """key1 value1 key2 value2 key3 value3 key4 value4 spec11 spec12 spec13 spec14 spec21 spec22 spec23 spec24 spec31 spec32 spec33 spec34 key5 value5 key6 value6 key7 value7 more11 more12 more13 more21 more22 more23 key8 value8 """ # retain significant newlines (pyparsing reads over whitespace by default) ParserElement.setDefaultWhitespaceChars(" \t") eol = LineEnd().suppress() elem = Word(alphas,alphanums) key = elem matrixRow = Group( elem + elem + OneOrMore(elem) ) + eol matrix = Group( OneOrMore( matrixRow ) ) + eol value = elem + eol + Optional( matrix ) + ZeroOrMore(eol) parser = dictOf(key, value) # parse the data results = parser.parseString(data) # access the results # - like a dict # - like a list # - like an instance with keys for attributes print results.keys() print for k in sorted(results.keys()): print k, if isinstance( results[k], basestring ): print results[k] else: print results[k][0] for row in results[k][1]: print " "," ".join(row) print print results.key3 Prints out: ['key8', 'key3', 'key2', 'key1', 'key7', 'key6', 'key5', 'key4'] key1 value1 key2 value2 key3 value3 key4 value4 spec11 spec12 spec13 spec14 spec21 spec22 spec23 spec24 spec31 spec32 spec33 spec34 key5 value5 key6 value6 key7 value7 more11 more12 more13 more21 more22 more23 key8 value8 value3 -- http://mail.python.org/mailman/listinfo/python-list