On 04/09/2013 16:04, Tim Chase wrote:
I've got some old 2.4 code (requires an external lib that hasn't been
upgraded) that needs to process a CSV file where some of the values
contain \r characters. It appears that in more recent versions (just
tested in 2.7; docs suggest this was changed in 2.5), Python does the
Right Thing™ and just creates values in the row containing that \r.
However, in 2.4, the csv module chokes on it with
_csv.Error: newline inside string
as demoed by the example code at the bottom of this email. What's the
best way to deal with this? At the moment, I'm just using something
like
def unCR(f):
for line in f:
yield line.replace('\r', '')
f = file('input.csv', 'rb')
for row in csv.reader(unCR(f)):
code_to_process(row)
but this throws away data that I'd really prefer to keep if possible.
I know 2.4 isn't exactly popular, and in an ideal world, I'd just
upgrade to a later 2.x version that does what I need. Any old-time
2.4 pythonistas have sage advice for me?
[snip]
You could try replacing the '\r' with another character that doesn't
appear elsewhere and then change it back afterwards.
MARKER = '\x01'
def cr_to_marker(f):
for line in f:
yield line.replace('\r', MARKER)
def marker_to_cr(item):
return item.replace(MARKER, '\r')
f = file('out.txt', 'rb')
r = csv.reader(cr_to_marker(f))
for i, row in enumerate(r): # works in 2.7, fails in 2.4
row = [marker_to_cr(item) for item in row]
print repr(row)
f.close()
Which OS are you using? On Windows the lines (rows) end with '\r\n', so
the last item of each row will end with '\r', which you'll need to
strip off. (That would be a problem only if the last item of a row
could end with '\r'.)
--
https://mail.python.org/mailman/listinfo/python-list