"rh0dium" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > Paul McGuire wrote: > > > ident = Combine( Word(alpha,alphanums+"_") + LPAR + RPAR ) > > This will only work for a word with a parentheses ( ie. somefunction() > ) > > > If you *really* want everything on the first line to be the ident, try this: > > > > ident = Word(alpha,alphanums+"_") + restOfLine > > or > > ident = Combine( Word(alpha,alphanums+"_") + restOfLine ) > > This nicely grabs the "\r".. How can I get around it? > > > Now the next step is to assign field names to the results: > > > > dataFormat = ident.setResultsName("ident") + ( dblQuotedString | > > quoteList ).setResultsName("contents") > > This is super cool!! > > So let's take this for example > > test= 'fprintf( outFile "leSetInstSelectable( t )\n" )\r\n ("test" > "test1" "foo aasdfasdf"\r\n "newline" "test2")\r\n' > > Now I want the ident to pull out 'fprintf( outFile > "leSetInstSelectable( t )\n" )' so I tried to do this? > > ident = Forward() > ident << Group( Word(alphas,alphanums) + LPAR + ZeroOrMore( > dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR) > > Borrowing from the example listed previously. But it bombs out cause > it wants a ")" but it has one.. Forward() ROCKS!! > > Also how does it know to do this for just the first line? It would > seem that this will work for every line - No? > This works for me:
test4 = r"""fprintf( outFile "leSetInstSelectable( t )\n" ) ("test" "test1" "foo aasdfasdf" "newline" "test2") """ ident = Forward() ident << Group( Word(alphas,alphanums) + LPAR + ZeroOrMore( dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR) dataFormat = ident + ( dblQuotedString | quoteList ) print dataFormat.parseString(test4) Prints: [['fprintf', '(', 'outFile', '"leSetInstSelectable( t )\\n"', ')'], ['"test"', '"test1"', '"foo aasdfasdf"', '"newline"', '"test2"']] 1. Is there supposed to be a real line break in the string "leSetInstSelectable( t )\n", or just a slash-n at the end? pyparsing quoted strings do not accept multiline quotes, but they do accept escaped characters such as "\t" "\n", etc. That is, to pyparsing: "\n this is a valid \t \n string" "this is not a valid string" Part of the confusion is that your examples include explicit \r\n characters. I'm assuming this is to reflect what you see when listing out the Python variable containing the string. (Are you opening a text file with "rb" to read in binary? Try opening with just "r", and this may resolve your \r\n problems.) 2. If restOfLine is still giving you \r's at the end, you can redefine restOfLine to not include them, or to include and suppress them. Or (this is easier) define a parse action for restOfLine that strips trailing \r's: def stripTrailingCRs(st,loc,toks): try: if toks[0][-1] == '\r': return toks[0][:-1] except: pass restOfLine.setParseAction( stripTrailingCRs ) 3. How does it know to only do it for the first line? Presumably you told it to do so. pyparsing's parseString method starts at the beginning of the input string, and matches expressions until it finds a mismatch, or runs out of expressions to match - even if there is more input string to process, pyparsing does not continue. To search through the whole file looking for idents, try using scanString which returns a generator; for each match, the generator gives a tuple containing: - tokens - the matched tokens - start - the start location of the match - end - the end location of the match If your input file consists *only* of these constructs, you can also just expand dataFormat.parseString to OneOrMore(dataFormat).parseString. -- Paul -- http://mail.python.org/mailman/listinfo/python-list