On Feb 17, 2011, at 3:01 PM, Paul Dupuis wrote:

First, thanks to everyone who replied, but especially to Nosanity. Your code reminded me that you can effectively tell when you are inside an encapsulated bit of data by an odd/even count of the encapsulation character. So, for anyone who wants it, here is a generalized function that I just wrote to parse a CSV file, regardless of the field or record delimiters (commas, tabs or whatever) and to deal with encapsulation appropriately.

This assumes you read the entire CSV file into a variable you pass into pData, so a call would look like:

put csvToArray(myEntireCSVData,return,comma,quote) into myDataAsArray

I have tested it a bit in the last 30 minutes and it working in the cases I tried, but did not test exhaustively and have not checked performance on large datasets. If any one uses this and run into an issue, please let me know.

function csvToArray pData, pRecordDelimiter, pFieldDelimiter, pEncapsulationDelimiter
 local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray

# Initialize the temporary record and field delimiters. Change these if your CSV file may contain them. put charToNum(1) into tReservedRecordDelimiter; put charToNum(2) into tReservedFieldDelimiter;

# Step 1: Replace any Record or Field delimiters that are encapsulated with temporary characters
 set itemdel to pEncapsulationDelimiter
 repeat with i = 1 to the number of items in pData
   if trunc(i/2) = (i/2) then
replace pFieldDelimiter with tReservedFieldDelimiter in item i of pData replace pRecordDelimiter with tReservedRecordDelimiter in item i of pData
   end if
 end repeat

 # Step 2: Replace all occurances of the encapsulation delimiter
 replace pEncapsulationDelimiter with empty in pData

# Step 3: Parse records and fields into the array, replace any occurances of the reserved record and field delimiters for each element
 set itemdel to pFieldDelimiter
 set lineDel to pRecordDelimiter
 repeat with i = 1 to the number of lines in pData
     repeat with j = 1 to the number of items in line i of pData
        get item j of line i of pData
        replace tReservedRecordDelimiter with pRecordDelimiter in it
        replace tReservedFieldDelimiter with pFieldDelimiter in it
        put it into tArray[i][j]
     end repeat
  end repeat

  # Step 4: return the array
  return tArray
end csvToArray

Here's an extension of your handler that will find appropriate reservedDelimiters that do not appear in the csv text, so you don't have to hope that numtochar(1) and numtochar(2) are not in the data:

function csvToArray pData, pRecordDelimiter, pFieldDelimiter, pEncapsulationDelimiter
   local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
# Initialize the temporary record and field delimiters. Change these if your CSV file may contain them.

   put getDelimiters(pData) into tDelim -- ## see below
   put numtochar(line 1 of tDelim) into tReservedRecordDelimiter
   put numtochar(line 2 of tDelim) into tReservedFieldDelimiter

# Step 1: Replace any Record or Field delimiters that are encapsulated with temporary characters
    set itemdel to pEncapsulationDelimiter
    repeat with i = 1 to the number of items in pData
         if trunc(i/2) = (i/2) then
replace pFieldDelimiter with tReservedFieldDelimiter in item i of pData replace pRecordDelimiter with tReservedRecordDelimiter in item i of pData
         end if
    end repeat

    # Step 2: Replace all occurances of the encapsulation delimiter
    replace pEncapsulationDelimiter with empty in pData

# Step 3: Parse records and fields into the array, replace any occurances of the reserved record and field delimiters for each element
    set itemdel to pFieldDelimiter
    set lineDel to pRecordDelimiter
    repeat with i = 1 to the number of lines in pData
           repeat with j = 1 to the number of items in line i of pData
                 get item j of line i of pData
replace tReservedRecordDelimiter with pRecordDelimiter in it replace tReservedFieldDelimiter with pFieldDelimiter in it
                 put it into tArray[i][j]
           end repeat
     end repeat

     # Step 4: return the array
     return tArray
end csvToArray

function getDelimiters tCSV
-- return 2 non-printing ASCII characters not found in the variable tCSV
   put "1,2,3,4,5,6,7,8" into dList
   -- could use other non-printing ASCII values
   put false into foundOne
   repeat with i = 1 to the number of items of dList
      put item i of dList into testNbr
      if numtochar(testNbr) is not in tCSV then
         put true into foundOne
         put testNbr into delimOne
         exit repeat
      end if
   end repeat
   if not foundOne then
      doAlert
      exit to top
   end if
   delete item i of dList
   put false into foundOne
   repeat with i = 1 to the number of items of dList
      put item i of dList into testNbr
      if numtochar(testNbr) is not in tCSV then
         put true into foundOne
         put testNbr into delimTwo
         exit repeat
      end if
   end repeat
   if not foundOne then
      doAlert
      exit to top
   end if
   return delimOne & cr & delimTwo
end getDelimiters

on doAlert
   answer "Cannot find delimiters!"
end doAlert

----------

Not tested but should work -- I used something like this in an old HC utility stack I put together years ago that I can't find at this point. Double-check me for bugs, since I'm working off the top of my head, but the principle is sound.

(The HC utility stack allowed you to open two stacks, one as the source of data, one as the recipient of data, then click on one field after another in the source stack, click on fields in order in the second stack (with prompts to clarify which corresponded to which), then depress the commandkey to exit the loading loop, and the handlers would load the data into the desired fields in the second stack. This of course depended on storing the data in CSV format in a variable using record and field delimiters found nowhere in the data being transferred. No arrays in those days.)

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to