Yeah, the "training empty item" problem has been much discussed, and there are good reasons for keeping it as it is (even apart from the need to not break existing code).

In similar circumstances, I've done

   replace (comma & CR) with (comma & space & CR) in tVariable

but in your case, even a space may not be exactly the same as totally empty.

Could you replace the empty trailing item with a quoted item ?
i.e.
   replace (comma & CR) with (comma & quote & quote & CR) in tVariable
without any unpleasant side-effects ?

-- Alex.

On 14/05/2012 21:00, Peter Haworth wrote:
I've just been checking out Alex's new csv parser and it is indeed much
faster than the original, closer to 50% than 40% in my test case.

However, I've also run into a Livecode issue while doing all this.  This
has come up before in the context of what LC thinks is a line, there's a
similar issue/confusion/whatever with items.

Let's say you have a string "1,2,3,4,5,6" - LC thinks there are 6 items in
it, no problem

Now change the string to "1,2,3,4,5,6," (note the trailing comma) - LC
still thinks there are 6 items in that string.

So to LC, "1,2,3,4,5,6" and 1,2,3,4,5,6," are equivalent in terms of the
number of items in them.  In the context of parsing csv files, they
definitely are not.

Pete
lcSQL Software<http://www.lcsql.com>



On Mon, May 7, 2012 at 4:30 PM, Alex Tweedly<a...@tweedly.net>  wrote:

Some years ago, this list discussed the difficulties of parsing
comma-separated-value file format; Richard Gaskin has a great article about
it at 
http://www.fourthworld.com/**embassy/articles/csv-must-die.**html<http://www.fourthworld.com/embassy/articles/csv-must-die.html>

Following that discussion, I came up with some code to parse CSV in
Livecode which was significantly faster than the straightforwards methods
(quoted in the above article). At the time, I put that speed gain down to
two factors

1. a way of looking at the problem "sideways" that enables a different
approach
2. a 'clever' use of split + array access

Recently the topic came up again, and I looked at the code again; I now
realize that in fact the speed gain came entirely from the first of those
two factors, and using split + arrays was not helpful. Livecode's chunk
handling is (in this case) faster than using arrays (my only excuse is that
I was new to Livecode, and so I was using techniques I was familiar with
from other languages). So I revised the code to use chunk handling rather
than split+arrays, and the resulting code runs about 40% faster, with the
added benefit of being slightly easier to read and understand.  The only
slightly mind-bending feature of the new code is the use of

    set the lineDelimiter to quote
    repeat for each line k in pData ....

I find it hard to think about "lines" that aren't actually lines :-)

So - for anyone who needs or wants more speed, here's the code

  function CSV3Tab pData,pcoldelim
  local tNuData -- contains tabbed copy of data
  local tReturnPlaceholder -- replaces cr in field data to avoid line
  --                       breaks which would be misread as records;
  --                       replaced later during dislay
  local tEscapedQuotePlaceholder -- used for keeping track of quotes
  --                       in data
  local tInQuotedText -- flag set while reading data between quotes
  local tInsideQuoted, k
  --
  put numtochar(11) into tReturnPlaceholder -- vertical tab as
  --                       placeholder
  put numtochar(2)  into tEscapedQuotePlaceholder -- used to simplify
  --                       distinction between quotes in data and those
  --                       used in delimiters
  --
  if pcoldelim is empty then put comma into pcoldelim
  -- Normalize line endings:
  replace crlf with cr in pData          -- Win to UNIX
  replace numtochar(13) with cr in pData -- Mac to UNIX
  --
  -- Put placeholder in escaped quote (non-delimiter) chars:
  replace ("\"&quote) with tEscapedQuotePlaceholder in pData
  replace quote&quote with tEscapedQuotePlaceholder in pData
  --
  put space before pData   -- to avoid ambiguity of starting context
  put False into tInsideQuoted
  set the linedel to quote
  repeat for each line k in pData
    if (tInsideQuoted) then
      replace cr with tReturnPlaceholder in k
      put k after tNuData
      put False into tInsideQuoted
    else
      replace pcoldelim with numtochar(29) in k
      put k after tNuData
      put true into tInsideQuoted
    end if
  end repeat
  --
  delete char 1 of tNuData -- remove the leading space
  replace tEscapedQuotePlaceholder with quote in tNuData
  return tNuData
end CSV3Tab


-- Alex.

______________________________**_________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to