New submission from Vaibhav Mallya:

I'm writing python `csv` based-parsers as part of a data processing pipeline 
that includes Redshift and other data stores upstream and down. It's easy and 
expected in all of these data stores  
(http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) that CSV-style 
data can be generated with ESCAPE'd newlines, and with or without quotes on the 
columns.

Challenge: However, 2.x CSV module has a bug where ESCAPE'd newlines in 
unquoted CSVs are not actually treated as escaped newlines, but as entirely new 
record entries. This is at odds with expected behavior in most common data 
warehouses (See - Redshift docs I linked above for example) and is a subtle 
source of bugs for data processing pipelines. We changed our Redshift 
Parameters to ADDQUOTES so we could get around this bug, after some debugging. 

Note - This seems to be a continuation of https://bugs.python.org/issue15927 
which was closed as WONTFIX for 2.x. I think this is a legitimate bug, and 
should be fixed in 2.x. If someone is relying on old / bad behavior might mean 
something else is wrong. In my view, the current behavior effectively adds an 
implicit, undocumented dialect to the CSV module.

----------
components: Library (Lib)
messages: 303025
nosy: mallyvai
priority: normal
severity: normal
status: open
title: CSV module incorrectly treats escaped newlines as new records if unquoted
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31590>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to