On 16/05/2012 00:35, Peter Haworth wrote:
Thanks Alex.

I ran the same data though your new handler and it seems to have worked
fine.

There was a recent discussion on some of these corner case issues on the
sqlite list so I'll go grab their test cases and see what happens.

As far as performance, the new handler took approx 2 1/2 times longer than
the CSV3 version on my 48k rows/17 columns dataset, but that's still only
about 1 second so definitely not a concern as mentioned previously.

I tried it out with this new test data. It has the odd characteristic of having partially quoted strings within the cell content; I've adjusted the script to allow for that (by removing one logic check). I've also added a line to add an extra empty item at the end of a line whenever the last item is already empty (i.e. to deal with Livecode's method of ignoring blank trailing items).

With these changes, csv4Tab() gets same results as the original csv2Tab() did, and they fit with what I think is correct for this strange data set :-)

Performance is still better than csv2Tab was, but sadly not as quick as (the incorrect) csv3Tab was.

function CSV4Tab pData,pcoldelim
    local tNuData -- contains tabbed copy of data
    local tReturnPlaceholder -- replaces cr in field data to avoid line
    --                       breaks which would be misread as records;
    local tNuDelim  -- new character to replace the delimiter
    local tStatus, theInsideStringSoFar
    --
put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder
    put numtochar(29) into tNuDelim
    --
    if pcoldelim is empty then put comma into pcoldelim
    -- Normalize line endings:
    replace crlf with cr in pData          -- Win to UNIX
    replace numtochar(13) with cr in pData -- Mac to UNIX

    put "outside" into tStatus
    set the itemdel to quote
    repeat for each item k in pData
        -- put tStatus && k & CR after msg
        switch tStatus

            case "inside"
                put k after theInsideStringSoFar
                put "passedquote" into tStatus
                next repeat

            case "passedquote"
-- decide if it was a duplicated escapedQuote or a closing quote
                if k is empty then   -- it's a duplicated quote
                    put quote after theInsideStringSoFar
                    put "inside" into tStatus
                    next repeat
                end if
-- not empty - so we remain inside the cell, though we have left the quoted section -- NB this allows for quoted sub-strings within the cell content !!
                replace cr with tReturnPlaceholder in theInsideStringSoFar
                put theInsideStringSoFar after tNuData

            case "outside"
                replace pcoldelim with tNuDelim in k
-- and deal with the "empty trailing item" issue in Livecode
                replace (tNuDelim & CR) with tNuDelim & tNuDelim & CR in k
                put k after tNuData
                put "inside" into tStatus
                put empty into theInsideStringSoFar
                next repeat
            default
                put "defaulted"
                break
        end switch
    end repeat
    return tNuData
end CSV4Tab

-- Alex.

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to