[issue30034] csv reader chokes on bad quoting in large files

2017-04-15 Thread Keith Erskine
Keith Erskine added the comment: OK Terry. Thank you everybody for your thoughts and suggestions. -- ___ Python tracker ___ ___ Pytho

[issue30034] csv reader chokes on bad quoting in large files

2017-04-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: Keith, while I sympathize with the request, I am going to agree with the others and close this. As I see it, csv exists to solve a particular problem. We want a printable file of records and text fields with visible field and record separators, but we may wa

[issue30034] csv reader chokes on bad quoting in large files

2017-04-11 Thread Keith Erskine
Keith Erskine added the comment: I should have said, Peter, an odd number of quotes does not necessarily mean the quoting is bad. For example, a line of: a,b",c will parse fine as ['a', 'b"', 'c']. Figuring out bad quoting is not easy, but if we know that there are no multiline fields in the

[issue30034] csv reader chokes on bad quoting in large files

2017-04-11 Thread Keith Erskine
Keith Erskine added the comment: The csv reader already supports bad CSV - that's what I believe "strict" is for - but only in one specific scenario. My request is to make that "strict" attribute a bit more useful. Thank you for your suggestion, Peter. I have toyed with the idea of looking

[issue30034] csv reader chokes on bad quoting in large files

2017-04-11 Thread Peter Otten
Peter Otten added the comment: While I don't think that the csv module should second-guess broken input you might consider "fixing" your data on the fly: def close_quote(line): if line.count('"') % 2: line = line.rstrip("\n") + '"\n' return line with open("data.csv") as f:

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread Raymond Hettinger
Raymond Hettinger added the comment: > In my experience CSV files with fields with embedded newlines > are pretty common. I don't really think we want to support > invalid CSV files. I concur with David on both points. Also, whether common or not, we don't want to break existing code that a

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread Keith Erskine
Keith Erskine added the comment: As you say, David, however much we would like the world to stick to a given CSV standard, the reality is that people don't, which is all the more reason for making the csv reader flexible and forgiving. The csv module can and should be used for more than just

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread R. David Murray
R. David Murray added the comment: Well, ETL is semi-standardized. Try dealing with csv files exported from excel spreadsheets written by non-programmers :) "e"X is not a quoting the csv module will produce, but I don't think it is a csv error. insofar as csv has a standard, it is a defacto

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread Keith Erskine
Keith Erskine added the comment: The csv reader already handles a certain amount of bad formatting. For example, using default behavior, the following file: a,b,c d,"e"X,f g,h,i is read as: ['a', 'b', 'c'] ['d', 'eX', 'f'] ['g', 'h', 'i'] It seems reasonable that csv should be able to handle

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread R. David Murray
R. David Murray added the comment: In my experience CSV files with fields with embedded newlines are pretty common. I don't really think we want to support invalid CSV files. -- nosy: +r.david.murray ___ Python tracker

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread Keith Erskine
Keith Erskine added the comment: Perhaps I should add what I would prefer the csv reader to return in my example above. That would be: ['a', 'b', 'c'] ['d', 'e,f'] ['g', 'h', 'i'] Yes, the second line is still mangled but at least the csv reader would carry on and read the third line correct

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread Mariatta Wijaya
Changes by Mariatta Wijaya : -- components: +Library (Lib) nosy: +Mariatta ___ Python tracker ___ ___ Python-bugs-list mailing list Un

[issue30034] csv reader chokes on bad quoting in large files

2017-04-10 Thread Keith Erskine
New submission from Keith Erskine: If a csv file has a quote character at the beginning of a field but no closing quote, the csv module will keep reading the file until the very end in an attempt to close out the field. It's true this situation occurs only when the quoting in a csv file is in