<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > hi, all. I need to process a file with the following format: > $ cat sample > [(some text)2.3(more text)4.5(more text here)] > [(aa bb ccc)-1.2(kdk)12.0(xxxyyy)] > [(xxx)11.0(bbb\))8.9(end here)] > ....... > > my goal here is for each line, extract every '(.*)' (including the > round > brackets, put them in a list, and extract every float on the same line > and put them in a list..
Are you wedded to re's? Here's a pyparsing approach for your perusal. It uses the new QuotedString class, treating your ()-enclosed elements as custom quoted strings (including backslash escape support). Some other things the parser does for you during parsing: - converts the numeric strings to floats - processes the \) escaped paren, returning just the ) Why not? While parsing, the parser "knows" it has just parsed a floating point number (or an escaped character), go ahead and do the conversion too. -- Paul (Download pyparsing at http://pyparsing.sourceforge.net.) -------------------- test = r""" [(some text)2.3(more text)4.5(more text here)] [(aa bb ccc)-1.2(kdk)12.0(xxxyyy)] [(xxx)11.0(bbb\))8.9(end here)] """ from pyparsing import oneOf,Combine,Optional,Word,nums,QuotedString,Suppress # define a floating point number sign = oneOf("+ -") floatNum = Combine( Optional(sign) + Word(nums) + "." + Word(nums) ) # have parser convert to actual floats while parsing floatNum.setParseAction(lambda s,l,t: float(t[0])) # define a "quoted string" where ()'s are the opening and closing quotes parenString = QuotedString("(",endQuoteChar=")",escChar="\\") # define the overall entry structure entry = Suppress("[") + parenString + floatNum + parenString + floatNum + parenString + Suppress("]") # scan for floats for toks,start,end in floatNum.scanString(test): print toks[0] print # scan for paren strings for toks,start,end in parenString.scanString(test): print toks[0] print # scan for entries for toks,start,end in entry.scanString(test): print toks print -------------------- Gives: 2.3 4.5 -1.2 12.0 11.0 8.9 some text more text more text here aa bb ccc kdk xxxyyy xxx bbb) end here ['some text', 2.2999999999999998, 'more text', 4.5, 'more text here'] ['aa bb ccc', -1.2, 'kdk', 12.0, 'xxxyyy'] ['xxx', 11.0, 'bbb)', 8.9000000000000004, 'end here'] -- http://mail.python.org/mailman/listinfo/python-list