On 15/05/2012 18:26, Bob Sneidar wrote:
<sigh>  Another good developer lost to the csv parsing chasm of hell. We won't 
be hearing from Alex again. ;-)

Don't worry Bob, I'm just a tourist here in the chasm, I'm not moving in :-)

Pete - please try this out on your data. AFAICT it should handle all the cases discussed here, and has the added benefit of being simpler and (slightly) easier to understand. Also, it uses no "global replace"s, so it would be much easier to modify it to handle very large files by reading bufferfulls at a time.

-- Alex.

function CSV4Tab pData,pcoldelim
    local tNuData -- contains tabbed copy of data
    local tReturnPlaceholder -- replaces cr in field data to avoid line
    --                       breaks which would be misread as records;
    local tStatus, theInsideStringSoFar
    --
put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder
    --
    if pcoldelim is empty then put comma into pcoldelim
    -- Normalize line endings:
    replace crlf with cr in pData          -- Win to UNIX
    replace numtochar(13) with cr in pData -- Mac to UNIX

    put "outside" into tStatus
    set the itemdel to quote
    repeat for each item k in pData
        switch tStatus

            case "inside"
                put k after theInsideStringSoFar
                put "passedquote" into tStatus
                next repeat

            case "passedquote"
-- decide if it was a duplicated escapedQuote or a closing quote
                if k is empty then   -- it's a duplicated quote
                    put quote after theInsideStringSoFar
                    put "inside" into tStatus
                    next repeat
                end if
                -- not empty - so we should have a delimiter here
                if char 1 of k = pcoldelim or char 1 of k = cr then
                    -- as we expect - we have just left the quoted string
replace cr with tReturnPlaceholder in theInsideStringSoFar
                    put theInsideStringSoFar after tNuData
                    -- and then deal with this outside item
                    -- by falling through into the 'outsie' case
                else
                    put "bad logic"
                    break
                end if

            case "outside"
                replace pcoldelim with numtochar(29) in k
                put k after tNuData
                put "inside" into tStatus
                put empty into theInsideStringSoFar
                next repeat
            default
                put "defaulted"
                break
        end switch
    end repeat
    return tNuData
end CSV4Tab



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to