On 16/05/2012 00:35, Peter Haworth wrote:
Thanks Alex.
I ran the same data though your new handler and it seems to have worked
fine.
There was a recent discussion on some of these corner case issues on the
sqlite list so I'll go grab their test cases and see what happens.
As far as performance, the new handler took approx 2 1/2 times longer than
the CSV3 version on my 48k rows/17 columns dataset, but that's still only
about 1 second so definitely not a concern as mentioned previously.
I tried it out with this new test data. It has the odd characteristic of
having partially quoted strings within the cell content; I've adjusted
the script to allow for that (by removing one logic check). I've also
added a line to add an extra empty item at the end of a line whenever
the last item is already empty (i.e. to deal with Livecode's method of
ignoring blank trailing items).
With these changes, csv4Tab() gets same results as the original
csv2Tab() did, and they fit with what I think is correct for this
strange data set :-)
Performance is still better than csv2Tab was, but sadly not as quick as
(the incorrect) csv3Tab was.
function CSV4Tab pData,pcoldelim
local tNuData -- contains tabbed copy of data
local tReturnPlaceholder -- replaces cr in field data to avoid line
-- breaks which would be misread as records;
local tNuDelim -- new character to replace the delimiter
local tStatus, theInsideStringSoFar
--
put numtochar(11) into tReturnPlaceholder -- vertical tab as
placeholder
put numtochar(29) into tNuDelim
--
if pcoldelim is empty then put comma into pcoldelim
-- Normalize line endings:
replace crlf with cr in pData -- Win to UNIX
replace numtochar(13) with cr in pData -- Mac to UNIX
put "outside" into tStatus
set the itemdel to quote
repeat for each item k in pData
-- put tStatus && k & CR after msg
switch tStatus
case "inside"
put k after theInsideStringSoFar
put "passedquote" into tStatus
next repeat
case "passedquote"
-- decide if it was a duplicated escapedQuote or a
closing quote
if k is empty then -- it's a duplicated quote
put quote after theInsideStringSoFar
put "inside" into tStatus
next repeat
end if
-- not empty - so we remain inside the cell, though we
have left the quoted section
-- NB this allows for quoted sub-strings within the
cell content !!
replace cr with tReturnPlaceholder in theInsideStringSoFar
put theInsideStringSoFar after tNuData
case "outside"
replace pcoldelim with tNuDelim in k
-- and deal with the "empty trailing item" issue in
Livecode
replace (tNuDelim & CR) with tNuDelim & tNuDelim & CR in k
put k after tNuData
put "inside" into tStatus
put empty into theInsideStringSoFar
next repeat
default
put "defaulted"
break
end switch
end repeat
return tNuData
end CSV4Tab
-- Alex.
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode