On Jul 7, 4:58 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Sat, 07 Jul 2007 08:32:52 +0200, Hendrik van Rooyen wrote: > >> erik,viking,"ham, spam and eggs","He said ""Ni!""","line one > >> line two" > > >> That's 5 elements: > > >> 1: eric > >> 2: viking > >> 3: ham, spam and eggs > >> 4: He said "Ni!" > >> 5: line one > >> line two > > > Also true - What can I say - I can only wriggle and mutter... > > > I see that you escaped the quotes by doubling them up - > > That's how Excel and the `csv` module do it. > > > What would the following parse to?: > > > erik,viking,ham, spam and eggs,He said "Ni!",line one > > line two > > Why don't you try yourself? The `csv` module returns two records, the > first has six items: > > 1: erik > 2: viking > 3: ham > 4: spam and eggs > 5: He said "Ni!" > 6: line one > > 'line two' is the only item in the next record then. >
The rules for quoting when writing can be expressed as: def outrow(inrow, quotechar='"', delimiter=','): out = [] for field in inrow: if quotechar in field: field = quotechar + field.replace(quotechar, quotechar*2) + quotechar elif delimiter in field or '\n' in field: # See note below. field = quotechar + field + quotechar out.append(field) return delimiter.join(out) Note: characters other than delimiter and \n can be included in the "to be quoted" list. What readers do with data that can *not* have been produced by a writer following the rules can get worse than BlackJack's example. Consider this: file nihao1.csv contains the following single line: 'Is the "," a mistake in "Ni, hao!"?\r\n' Openoffice.org's Calc 2.1 shows the equivalent of ['Is the "', ' a mistake in Ni', ' hao!"?\n'] in a Text Import window, but then silently produces nothing. A file with two such lines causes 5 fields to be shown in the window -- it apparently thinks the newlines are inside quoted fields! Gnumeric 1.7.6 silently produces the equivalent of result = ['Is the "', ' a mistake in ', 'hao!"?'] map(len, result) -> [8, 14, 6] What happened to Ni? Multiple such lines produce multiple rows. Excel 11.0 (2003) silently produces in effect result = ['Is the "', ' a mistake in Ni', ' hao!"?'] map(len, result) -> [8, 16, 7] Multiple such lines produce multiple rows. The csv module does what Excel does. Consumers of csv files are exhorted to apply whatever sanity checks they can. Examples: (1) If the csv file was produced as a result of a database query, the number of columns should be known and used as a check on the length of each row received. (2) A field containing an odd number of " characters (or more generally, not meeting whatever quoting convention might be expected in the underlying data) should be treated with suspicion. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list