Re: Dealing with \r in CSV fields in Python2.4

MRAB Wed, 04 Sep 2013 08:34:00 -0700

On 04/09/2013 16:04, Tim Chase wrote:

I've got some old 2.4 code (requires an external lib that hasn't been
upgraded) that needs to process a CSV file where some of the values
contain \r characters.  It appears that in more recent versions (just
tested in 2.7; docs suggest this was changed in 2.5), Python does the
Right Thing™ and just creates values in the row containing that \r.
However, in 2.4, the csv module chokes on it with


   _csv.Error: newline inside string

as demoed by the example code at the bottom of this email.  What's the
best way to deal with this?  At the moment, I'm just using something
like

   def unCR(f):
     for line in f:
       yield line.replace('\r', '')

   f = file('input.csv', 'rb')
   for row in csv.reader(unCR(f)):
     code_to_process(row)

but this throws away data that I'd really prefer to keep if possible.

I know 2.4 isn't exactly popular, and in an ideal world, I'd just
upgrade to a later 2.x version that does what I need.  Any old-time
2.4 pythonistas have sage advice for me?

[snip]
You could try replacing the '\r' with another character that doesn't
appear elsewhere and then change it back afterwards.

MARKER = '\x01'

def cr_to_marker(f):
    for line in f:
        yield line.replace('\r', MARKER)

def marker_to_cr(item):
    return item.replace(MARKER, '\r')

f = file('out.txt', 'rb')
r = csv.reader(cr_to_marker(f))
for i, row in enumerate(r): # works in 2.7, fails in 2.4
    row = [marker_to_cr(item) for item in row]
    print repr(row)
f.close()

Which OS are you using? On Windows the lines (rows) end with '\r\n', so
the last item of each row will end with '\r', which you'll need to
strip off. (That would be a problem only if the last item of a row
could end with '\r'.)

--
https://mail.python.org/mailman/listinfo/python-list

Re: Dealing with \r in CSV fields in Python2.4

Reply via email to