On Feb 17, 8:09 pm, Christopher Barrington-Leigh <[EMAIL PROTECTED]> wrote: > Here is a file "test.csv" > number,name,description,value > 1,"wer","tape 2"",5 > 1,vvv,"hoohaa",2 > > I want to convert it to tab-separated without those silly quotes. Note > in the second line that a field is 'tape 2"' , ie two inches: there is > a double quote in the string. >
What is needed to disambiguate this data is to only accept closing quotes if they are followed by a comma or the end of the line. In pyparsing, you can define your own quoted string format. Here is one solution using pyparsing. At the end, you can extract the data by field name, and print it out however you choose: data = """\ number,name,description,value 1,"wer","tape 2"",5 1,vvv,"hoohaa",2""" from pyparsing import * # very special definition of a quoted string, that ends with a " only if # followed by a , or the end of line quotedString = ('"' + ZeroOrMore(CharsNotIn('"')|('"' + ~FollowedBy(','|lineEnd))) + '"') quotedString.setParseAction(keepOriginalText, removeQuotes) integer = Word(nums).setParseAction(lambda toks:int(toks[0])) value = integer | quotedString | Word(printables.replace(",","")) # first pass, just parse the comma-separated values for line in data.splitlines(): print delimitedList(value).parseString(line) print # now second pass, assign field names using names from first line names = data.splitlines()[0].split(',') def setValueNames(tokens): for k,v in zip(names,tokens): tokens[k] = v lineDef = delimitedList(value).setParseAction(setValueNames) # parse each line, and extract data by field name for line in data.splitlines()[1:]: results = lineDef.parseString(line) print "Desc:", results.description print results.dump() Prints: ['number', 'name', 'description', 'value'] [1, 'wer', 'tape 2"', 5] [1, 'vvv', 'hoohaa', 2] Desc: tape 2" [1, 'wer', 'tape 2"', 5] - description: tape 2" - name: wer - number: 1 - value : 5 Desc: hoohaa [1, 'vvv', 'hoohaa', 2] - description: hoohaa - name: vvv - number: 1 - value : 2 -- Paul -- http://mail.python.org/mailman/listinfo/python-list