Bugs item #1465014, was opened at 2006-04-05 10:14 Message generated for change (Comment added) made by montanaro You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1465014&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 2.5 Status: Pending Resolution: Fixed Priority: 5 Submitted By: David Goodger (goodger) Assigned to: Skip Montanaro (montanaro) Summary: CSV regression in 2.5a1: multi-line cells Initial Comment: Running the attached csv_test.py under Python 2.4.2 (Windows XP SP1) produces: >c:\apps\python24\python.exe ./csv_test.py ['one', '2', 'three (line 1)\n(line 2)'] Note that the third item in the row contains a newline between "(line 1)" and "(line 2)". With Python 2.5a1, I get: >c:\apps\python25\python.exe ./csv_test.py ['one', '2', 'three (line 1)(line 2)'] Notice the missing newline, which is significant. The CSV module under 2.5a1 seems to lose data. ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2006-07-30 22:13 Message: Logged In: YES user_id=44345 I'll see your 50993 and raise you a 50998. Just minor tweaks. Hopefully we can close this puppy, though a small example to make the idea concrete might be worthwhile. ---------------------------------------------------------------------- Comment By: Andrew McNamara (andrewmcnamara) Date: 2006-07-30 21:41 Message: Logged In: YES user_id=698599 I've changed the comment again in changeset 50993 - hopefully this attempt describes the difference more fully. Let me know what you think. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2006-07-29 15:07 Message: Logged In: YES user_id=44345 I checked in a change to libcsv.tex (revision 50953). It adds a versionchanged bit to the reader doc that explains why the behavior changed in 2.5. Andrew & Andrew, please check my work. Sorry for the delay taking care of this. Skip ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2006-07-29 12:24 Message: Logged In: YES user_id=11375 I looked at this bug report, but I have no idea of exactly what behaviour has changed or what needs to be described. ---------------------------------------------------------------------- Comment By: Andrew McNamara (andrewmcnamara) Date: 2006-06-22 22:34 Message: Logged In: YES user_id=698599 Yep, your point about adding a comment to the documentation is fair. Skip, do you want to take my words and massage them into a form suitable for the docs? ---------------------------------------------------------------------- Comment By: David Goodger (goodger) Date: 2006-06-22 22:13 Message: Logged In: YES user_id=7733 I didn't realize that the previous behavior was buggy; I thought that the current behavior was a side-effect. The 2.5 behavior did cause a small problem in Docutils, but it's already been fixed. I just wanted to ensure that no regression was creeping in to 2.5. Thanks for the explanation! Perhaps it could be added to the docs in some form? Marking the bug report closed. ---------------------------------------------------------------------- Comment By: Andrew McNamara (andrewmcnamara) Date: 2006-06-22 19:27 Message: Logged In: YES user_id=698599 The previous behaviour caused considerable problems, particularly on platforms that did not use the unix line- ending conventions, or with files that originated on those platforms - users were finding mysterious newlines where they didn't expect them. Quoted fields exist to allow characters that would otherwise be considered part of the syntax to appear within the field. So yes, quoted fields are a special case, and necessarily so. The current behaviour puts the control back in the hands of the user of the module: if literal newlines are important within a field, they need to read their file in a way that preserves the newlines. The old behaviour would introduce spurious characters into quoted fields, with no way for the user to control that behaviour. I'm sorry that the change causes you problems. With a format that's as loosely defined as CSV, it's an unfortunate fact of life that there are going to be conflicting requirements. ---------------------------------------------------------------------- Comment By: David Goodger (goodger) Date: 2006-06-22 13:17 Message: Logged In: YES user_id=7733 I see what you're saying, but I disagree. In Python 2.4, csv.reader did not require newlines, but in Python 2.5 it does. That's a significant behavioral change. In the stdlib csv "Module Contents" docs for csv.reader, it says: "csvfile can be any object which supports the iterator protocol and returns a string each time its next method is called." It doesn't mention newline-terminated strings. In any case, the behavior is inconsistent: newlines are not required to terminate row-ending strings, but only strings which end inside cells split across rows. Why the discrepancy? ---------------------------------------------------------------------- Comment By: Andrew McNamara (andrewmcnamara) Date: 2006-06-20 18:17 Message: Logged In: YES user_id=698599 I think your problem is with str.splitlines(), rather than the csv.reader: splitlines ate the newline. If you pass it True as an argument, it will retain the end-of-line character in the resulting strings. ---------------------------------------------------------------------- Comment By: David Goodger (goodger) Date: 2006-05-02 16:04 Message: Logged In: YES user_id=7733 Assigned to Andrew McNamara, since his change appears to have caused this regression (revision 38290 on Modules/_csv.c). ---------------------------------------------------------------------- Comment By: David Goodger (goodger) Date: 2006-05-02 15:58 Message: Logged In: YES user_id=7733 Further investigation has revealed that the regression only affects iterator I/O, not file I/O. The attached csv_test.py demonstrates. Run with Python 2.5 to get: results from file I/O: [['one', '2', 'three (line 1)\n(line 2)']] results from iterator I/O: [['one', '2', 'three (line 1)(line 2)']] ---------------------------------------------------------------------- Comment By: David Goodger (goodger) Date: 2006-04-05 10:44 Message: Logged In: YES user_id=7733 This bug seems to be a side effect of revision 38290 on Modules/_csv.c, which was prompted by bug 967934 (http://www.python.org/sf/967934). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1465014&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com