Bugs item #1072404, was opened at 2004-11-24 23:00 Message generated for change (Comment added) made by andrewmcnamara You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1072404&group_id=5470
Category: Python Library Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Chris Withers (fresh) >Assigned to: Andrew McNamara (andrewmcnamara) Summary: Bugs in _csv module - lineterminator Initial Comment: On trying to parse a '\r' terminated csv generated on a Mac, I get a "newline inside string" error from the csv module. Two things sprung to mind having read: http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_csv.c?rev=1.15&view=markup ...for a bit. 1. The Dialect's lineterminator doesn't appear to be used when parsing a CSV. This feels like a bug to be, 'cos I could specify the terminator if Reader_iternext(ReaderObj *self) used it :-S 2. The processing in Reader_iternext(ReaderObj *self) assumes that a '\r' will be followed by '\0' for Macs, '\n' for windows, and anything else is an error. but: >>> c = open('var\data\metadata.csv').read() >>> c[:100] 'BENEFIT,,Subjects relating to all benefits,AB \rBENEFIT,PARTNERDIED,Bereavement Should I be expecting to see a '\0' there? Anyway, the real bug seems to be the reader's ignorance of the lineterminator. However, even if my analysis is off the mark, the problem still exists :-S cheers, Chris ---------------------------------------------------------------------- >Comment By: Andrew McNamara (andrewmcnamara) Date: 2005-01-13 15:14 Message: Logged In: YES user_id=698599 The reader expects to be supplied an iterator that returns lines - in this case, the file iterator has not recognised \r as end-of-line and has read the whole file in and yielded that as a "line". If you use universal-newline mode on your source file, you should have more luck. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2004-11-25 15:23 Message: Logged In: YES user_id=44345 This is a known problem. See the April archives of the csv mailing list: http://manatee.mojam.com/pipermail/csv/2004-April/thread.html Solutions are welcome. I suspect any solution will involve either discarding PyIter_Next altogether or further subdividing what it returns. A couple things to note in the way of workarounds: 1. Reader_iternext() defers to PyIter_Next() to grab the next line, so there's really no opportunity to interject the lineterminator into the operation with the current code. This means reading from StringIO objects that use \r lineterminators will always fail. 2. If you have a real file as input and open it in universal newline mode you will get the correct behavior. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1072404&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com