John Machin <sjmac...@users.sourceforge.net> added the comment: Sorry, folks, we've got an understanding problem here. CSV files are typically NOT created by text editors. They are created e.g. by "save as csv" from a spreadsheet program, or as an output option by some database query program. They can have just about any character in a field, including \r and \n. Fields containing those characters should be quoted (just like a comma) by the csv file producer. A csv reader should be capable of reproducing the original field division. Here for example is a dump of a little file I just created using Excel 2003:
C:\devel\csv>\python26\python -c "print repr(open('book1.csv','rb').read())" 'Field1,"Field 2 has a\nvery long\nheading",Field3\r\n1.11,2.22,3.33\r\n' Inserting \n into a text field in Excel (using Alt-Enter) is a well-known user trick. Here's what we get from Python 2.6.1: C:\devel\csv>\python26\python -c "import csv; print repr(list(csv.reader(open('book1.csv','rb'))))" [['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11', '2.22', '3.33']] and the same by design all the way back to Python 2.3's csv module and its ancestor, the ObjectCraft csv module. However with Python 3.0.1 we get: C:\devel\csv>\python30\python -c "import csv; print(repr(list(csv.reader(open('book1.csv','rb')))))" Traceback (most recent call last): File "<string>", line 1, in <module> _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) This sentence in the documentation is NOT an error: """If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.""" The problem *IS* a "biggie". This paragraph in the documentation (evidently introduced in 2.5) is rather confusing:"""The parser is quite strict with respect to multi-line quoted fields. Previously, if a line ended within a quoted field without a terminating newline character, a newline would be inserted into the returned field. This behavior caused problems when reading files which contained carriage return characters within fields. The behavior was changed to return the field without inserting newlines. As a consequence, if newlines embedded within fields are important, the input should be split into lines in a manner which preserves the newline characters.""" Some examples of what it is talking about would be a very good idea. ---------- nosy: +sjmachin _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4847> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com