On Feb 17, 8:09 pm, Christopher Barrington-Leigh
<[EMAIL PROTECTED]> wrote:
> Here is a file "test.csv"
> number,name,description,value
> 1,"wer","tape 2"",5
> 1,vvv,"hoohaa",2
>
> I want to convert it to tab-separated without those silly quotes. Note
> in the second line that a field is 'tape 2"' , ie two inches: there is
> a double quote in the string.
>

What is needed to disambiguate this data is to only accept closing
quotes if they are followed by a comma or the end of the line.  In
pyparsing, you can define your own quoted string format.  Here is one
solution using pyparsing.  At the end, you can extract the data by
field name, and print it out however you choose:

data = """\
number,name,description,value
1,"wer","tape 2"",5
1,vvv,"hoohaa",2"""


from pyparsing import *

# very special definition of a quoted string, that ends with a " only
if
# followed by a , or the end of line
quotedString = ('"' +
    ZeroOrMore(CharsNotIn('"')|('"' + ~FollowedBy(','|lineEnd))) +
    '"')
quotedString.setParseAction(keepOriginalText, removeQuotes)
integer = Word(nums).setParseAction(lambda toks:int(toks[0]))
value = integer | quotedString | Word(printables.replace(",",""))

# first pass, just parse the comma-separated values
for line in data.splitlines():
    print delimitedList(value).parseString(line)
print

# now second pass, assign field names using names from first line
names = data.splitlines()[0].split(',')
def setValueNames(tokens):
    for k,v in zip(names,tokens):
        tokens[k] = v
lineDef = delimitedList(value).setParseAction(setValueNames)

# parse each line, and extract data by field name
for line in data.splitlines()[1:]:
    results = lineDef.parseString(line)
    print "Desc:", results.description
    print results.dump()


Prints:
['number', 'name', 'description', 'value']
[1, 'wer', 'tape 2"', 5]
[1, 'vvv', 'hoohaa', 2]

Desc: tape 2"
[1, 'wer', 'tape 2"', 5]
- description: tape 2"
- name: wer
- number: 1
- value : 5
Desc: hoohaa
[1, 'vvv', 'hoohaa', 2]
- description: hoohaa
- name: vvv
- number: 1
- value : 2

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to