Many years ago (2004 ?) I posted code to do something like his using
split/combine to differentiate between 'inside' and 'outside' field
delimiters. It was very fast - but pretty hard to follow, and I don't
remember now which obscure cases it handled (we haven't even mentioned
doubled characters and backslash escaped field delimiters yet :-)
The code Paul posted is very clear and easy to follow - but it will
suffer pretty severe performance issues on large data sets. If you need
to worry about that, you could get pretty good speed up just by
replacing the "repeat with ..." by "repeat for each ..." - see below.
The time to run this will grow (approx) linearly with the number of
lines in the input data, whereas the previous version was a bit worse
than N-squared growth ....
function csvToArray pData, pRecordDelimiter, pFieldDelimiter,
pEncapsulationDelimiter
local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
# Initialize the temporary record and field delimiters. Change
these if your CSV file may contain them.
put charToNum(1) into tReservedRecordDelimiter; put charToNum(2)
into tReservedFieldDelimiter;
# Step 1: Replace any Record or Field delimiters that are
encapsulated with temporary characters
set itemdel to pEncapsulationDelimiter
put false into tIsEven
repeat for each item itm in pData
if tIsEven then
replace pFieldDelimiter with tReservedFieldDelimiter in itm
replace pRecordDelimiter with tReservedRecordDelimiter in itm
end if
put itm & itemDelimiter after tData
put not tIsEven into tIsEven
end repeat
delete the last char of tData
put tData into pData
# Step 2: Replace all occurances of the encapsulation delimiter
replace pEncapsulationDelimiter with empty in pData
-- put pData into field "F"
# Step 3: Parse records and fields into the array, replace any
occurances of the reserved record and field delimiters for each element
set itemdel to pFieldDelimiter
set lineDel to pRecordDelimiter
put 0 into i
repeat for each line L in pData
add 1 to i
put 0 into j
repeat for each item itm in L
add 1 to j
replace tReservedRecordDelimiter with pRecordDelimiter in itm
replace tReservedFieldDelimiter with pFieldDelimiter in itm
put itm into tArray[i][j]
end repeat
end repeat
# Step 4: return the array
return tArray
end csvToArray
On 17/02/2011 20:01, Paul Dupuis wrote:
First, thanks to everyone who replied, but especially to Nosanity.
Your code reminded me that you can effectively tell when you are
inside an encapsulated bit of data by an odd/even count of the
encapsulation character. So, for anyone who wants it, here is a
generalized function that I just wrote to parse a CSV file, regardless
of the field or record delimiters (commas, tabs or whatever) and to
deal with encapsulation appropriately.
This assumes you read the entire CSV file into a variable you pass
into pData, so a call would look like:
put csvToArray(myEntireCSVData,return,comma,quote) into myDataAsArray
I have tested it a bit in the last 30 minutes and it working in the
cases I tried, but did not test exhaustively and have not checked
performance on large datasets. If any one uses this and run into an
issue, please let me know.
function csvToArray pData, pRecordDelimiter, pFieldDelimiter,
pEncapsulationDelimiter
local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
# Initialize the temporary record and field delimiters. Change these
if your CSV file may contain them.
put charToNum(1) into tReservedRecordDelimiter; put charToNum(2)
into tReservedFieldDelimiter;
# Step 1: Replace any Record or Field delimiters that are
encapsulated with temporary characters
set itemdel to pEncapsulationDelimiter
repeat with i = 1 to the number of items in pData
if trunc(i/2) = (i/2) then
replace pFieldDelimiter with tReservedFieldDelimiter in item i
of pData
replace pRecordDelimiter with tReservedRecordDelimiter in item i
of pData
end if
end repeat
# Step 2: Replace all occurances of the encapsulation delimiter
replace pEncapsulationDelimiter with empty in pData
# Step 3: Parse records and fields into the array, replace any
occurances of the reserved record and field delimiters for each element
set itemdel to pFieldDelimiter
set lineDel to pRecordDelimiter
repeat with i = 1 to the number of lines in pData
repeat with j = 1 to the number of items in line i of pData
get item j of line i of pData
replace tReservedRecordDelimiter with pRecordDelimiter in it
replace tReservedFieldDelimiter with pFieldDelimiter in it
put it into tArray[i][j]
end repeat
end repeat
# Step 4: return the array
return tArray
end csvToArray
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode