On Mar 28, 8:40 am, [EMAIL PROTECTED] wrote: > On Mar 27, 1:53 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> > wrote: > > > > > En Thu, 27 Mar 2008 17:37:33 -0300, Aaron Watters > > <[EMAIL PROTECTED]> escribió: > > > >> "this";"is";"a";"test" > > > >> Resulting in an output of: > > > >> ['this', 'is', 'a', 'test'] > > > >> However, if I modify the csv to: > > > >> "t"h"is";"is";"a";"test" > > > >> The output changes to: > > > >> ['th"is"', 'is', 'a', 'test'] > > > > I'd be tempted to say that this is a bug, > > > except that I think the definition of "csv" is > > > informal, so the "bug/feature" distinction > > > cannot be exactly defined, unless I'm mistaken. > > > AFAIK, the csv module tries to mimic Excel behavior as close as possible. > > It has some test cases that look horrible, but that's what Excel does... > > I'd try actually using Excel to see what happens. > > Perhaps the behavior could be more configurable, like the codecs are. > > > -- > > Gabriel Genellina > > Thank you Aaron and Gabriel. I was also hesitant to use the term > "bug" since as you said CSV isn't a standard. Yet in the same right I > couldn't readily think of an instance where the quote should be > removed if it's not sitting right next to the delimiter (or at the > very beginning/end of the line). > > I'm not even sure if it should be patched since there could be cases > where this is how people want it to behave and I wouldn't want their > code to break. > > I think rolling out a custom class seems like the only solution but if > anyone else has any other advice I'd like to hear it. >
I have code in awk, C, and Python for reading bad-CSV data under the assumptions (1) no embedded newlines (2) embedded quotes are not doubled as they should be (3) there is an even number of quotes in each original field (4) the caller prefers an exception or error return when there is anomalous data. -- http://mail.python.org/mailman/listinfo/python-list