On Fri, 06 Jul 2007 08:34:55 +0200, Hendrik van Rooyen wrote:

> "John Machin" <sj,,,[EMAIL PROTECTED]> wrote:
> 
>> 
>> I don't know what you mean by "requires more than one
>> character of lookahead" -- any non-Mickey-Mouse implementation of a
>> csv reader will use a finite state machine with about half-a-dozen
>> states, and data structures no more complicated than (1) completed
>> rows received so far (2) completed fields in current row (3) bytes in
>> current field. When a new input byte arrives, what to do can be
>> determined based on only that byte and the current state; no look-
>> ahead into the input stream is required, nor is any look-back into
>> those data structures.
>> 
> 
> True.
> 
> You can even do it more simply - by writing a GetField() that
> scans for either the delimiter or end of line or end of file, and 
> returns the "field" found, along with the delimiter that caused 
> it to exit, and then writing a GetRecord() that repetitively calls
> the GetField and assembles the row record until the delimiter 
> returned is either the end of line or the end of file, remembering 
> that the returned field may be empty, and handling the cases based 
> on the delimiter returned when it is.
> 
> This also makes all the decisions based on the current character
> read, no lookahead as far as I can see.
> 
> Also no state variables, no switch statements...
> 
> Is this the method that you would call "Mickey Mouse"?
Maybe, because you've left out all handling of quoting and escape
characters here.  Consider this:

erik,viking,"ham, spam and eggs","He said ""Ni!""","line one
line two"

That's 5 elements:

1: eric
2: viking
3: ham, spam and eggs
4: He said "Ni!"
5: line one
   line two

Ciao,
        Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to