On Fri, 06 Jul 2007 08:34:55 +0200, Hendrik van Rooyen wrote: > "John Machin" <sj,,,[EMAIL PROTECTED]> wrote: > >> >> I don't know what you mean by "requires more than one >> character of lookahead" -- any non-Mickey-Mouse implementation of a >> csv reader will use a finite state machine with about half-a-dozen >> states, and data structures no more complicated than (1) completed >> rows received so far (2) completed fields in current row (3) bytes in >> current field. When a new input byte arrives, what to do can be >> determined based on only that byte and the current state; no look- >> ahead into the input stream is required, nor is any look-back into >> those data structures. >> > > True. > > You can even do it more simply - by writing a GetField() that > scans for either the delimiter or end of line or end of file, and > returns the "field" found, along with the delimiter that caused > it to exit, and then writing a GetRecord() that repetitively calls > the GetField and assembles the row record until the delimiter > returned is either the end of line or the end of file, remembering > that the returned field may be empty, and handling the cases based > on the delimiter returned when it is. > > This also makes all the decisions based on the current character > read, no lookahead as far as I can see. > > Also no state variables, no switch statements... > > Is this the method that you would call "Mickey Mouse"?
Maybe, because you've left out all handling of quoting and escape characters here. Consider this: erik,viking,"ham, spam and eggs","He said ""Ni!""","line one line two" That's 5 elements: 1: eric 2: viking 3: ham, spam and eggs 4: He said "Ni!" 5: line one line two Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list