Yeah, the "training empty item" problem has been much discussed, and
there are good reasons for keeping it as it is (even apart from the need
to not break existing code).
In similar circumstances, I've done
replace (comma & CR) with (comma & space & CR) in tVariable
but in your case, even a space may not be exactly the same as totally empty.
Could you replace the empty trailing item with a quoted item ?
i.e.
replace (comma & CR) with (comma & quote & quote & CR) in tVariable
without any unpleasant side-effects ?
-- Alex.
On 14/05/2012 21:00, Peter Haworth wrote:
I've just been checking out Alex's new csv parser and it is indeed much
faster than the original, closer to 50% than 40% in my test case.
However, I've also run into a Livecode issue while doing all this. This
has come up before in the context of what LC thinks is a line, there's a
similar issue/confusion/whatever with items.
Let's say you have a string "1,2,3,4,5,6" - LC thinks there are 6 items in
it, no problem
Now change the string to "1,2,3,4,5,6," (note the trailing comma) - LC
still thinks there are 6 items in that string.
So to LC, "1,2,3,4,5,6" and 1,2,3,4,5,6," are equivalent in terms of the
number of items in them. In the context of parsing csv files, they
definitely are not.
Pete
lcSQL Software<http://www.lcsql.com>
On Mon, May 7, 2012 at 4:30 PM, Alex Tweedly<a...@tweedly.net> wrote:
Some years ago, this list discussed the difficulties of parsing
comma-separated-value file format; Richard Gaskin has a great article about
it at
http://www.fourthworld.com/**embassy/articles/csv-must-die.**html<http://www.fourthworld.com/embassy/articles/csv-must-die.html>
Following that discussion, I came up with some code to parse CSV in
Livecode which was significantly faster than the straightforwards methods
(quoted in the above article). At the time, I put that speed gain down to
two factors
1. a way of looking at the problem "sideways" that enables a different
approach
2. a 'clever' use of split + array access
Recently the topic came up again, and I looked at the code again; I now
realize that in fact the speed gain came entirely from the first of those
two factors, and using split + arrays was not helpful. Livecode's chunk
handling is (in this case) faster than using arrays (my only excuse is that
I was new to Livecode, and so I was using techniques I was familiar with
from other languages). So I revised the code to use chunk handling rather
than split+arrays, and the resulting code runs about 40% faster, with the
added benefit of being slightly easier to read and understand. The only
slightly mind-bending feature of the new code is the use of
set the lineDelimiter to quote
repeat for each line k in pData ....
I find it hard to think about "lines" that aren't actually lines :-)
So - for anyone who needs or wants more speed, here's the code
function CSV3Tab pData,pcoldelim
local tNuData -- contains tabbed copy of data
local tReturnPlaceholder -- replaces cr in field data to avoid line
-- breaks which would be misread as records;
-- replaced later during dislay
local tEscapedQuotePlaceholder -- used for keeping track of quotes
-- in data
local tInQuotedText -- flag set while reading data between quotes
local tInsideQuoted, k
--
put numtochar(11) into tReturnPlaceholder -- vertical tab as
-- placeholder
put numtochar(2) into tEscapedQuotePlaceholder -- used to simplify
-- distinction between quotes in data and those
-- used in delimiters
--
if pcoldelim is empty then put comma into pcoldelim
-- Normalize line endings:
replace crlf with cr in pData -- Win to UNIX
replace numtochar(13) with cr in pData -- Mac to UNIX
--
-- Put placeholder in escaped quote (non-delimiter) chars:
replace ("\""e) with tEscapedQuotePlaceholder in pData
replace quote"e with tEscapedQuotePlaceholder in pData
--
put space before pData -- to avoid ambiguity of starting context
put False into tInsideQuoted
set the linedel to quote
repeat for each line k in pData
if (tInsideQuoted) then
replace cr with tReturnPlaceholder in k
put k after tNuData
put False into tInsideQuoted
else
replace pcoldelim with numtochar(29) in k
put k after tNuData
put true into tInsideQuoted
end if
end repeat
--
delete char 1 of tNuData -- remove the leading space
replace tEscapedQuotePlaceholder with quote in tNuData
return tNuData
end CSV3Tab
-- Alex.
______________________________**_________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode