Hi Peter,

it also requires offsets() - I can guess what it does, but it would be safer to get the actual code you use :-)

Thanks
-- Alex.

On 18/10/2015 00:41, Peter M. Brigham wrote:
So here's my attempt. It converts a CVS text to an array. Let's see if there's 
csv data that can break it.

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig

-------

function CSVtoArray pData
    -- by Peter M. Brigham, pmb...@gmail.com
    -- requires getDelimiters(), howmany()
    put getDelimiters(pData,5) into tDelims
    put line 1 of tDelims into crChar
    put line 2 of tDelims into tabChar
    put line 3 of tDelims into commaChar
    put line 4 of tDelims into openQuoteChar
    put line 5 of tDelims into closeQuoteChar
replace crlf with cr in pData -- Win to UNIX
    replace numtochar(13) with cr in pData -- Mac to UNIX
if howmany(quote,pData) mod 2 = 1 then
       return "This CSV data is not parsable (unclosed quotes in data)."
    end if
put offsets(quote,pData) into qOffsets
    if qOffsets > 0 then
       put 1 into counter
       repeat for each item q in qOffsets
          if counter mod 2 = 1 then put openQuoteChar into char q of pData
          else put closeQuoteChar into char q of pData
          add 1 to counter
       end repeat
    end if
put offsets(cr,pData) into crOffsets
    repeat for each item r in crOffsets
       put char 1 to r of pData into upToHere
       if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) 
then
          -- the cr is within a quoted string
          put crChar into char r of pData
       end if
    end repeat
    put offsets(tab,pData) into tabOffsets
    repeat for each item t in tabOffsets
       put char 1 to t of pData into upToHere
       if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) 
then
          -- the tab is within a quoted string
          put tabChar into char t of pData
       end if
    end repeat
    put offsets(comma,pData) into commaOffsets
    repeat for each item c in commaOffsets
       put char 1 to c of pData into upToHere
       if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) 
then
          -- the comma is within a quoted string
          put commaChar into char c of pData
       end if
    end repeat
    put 0 into lineCounter
    repeat for each line L in pData
       add 1 to lineCounter
       put 0 into itemCounter
       repeat for each item i in L
          add 1 to itemCounter
          put i into thisItem
          if howmany(quote,thisItem) mod 2 = 1 then
             return "This CSV data is not parsable (unclosed quotes in item)."
          end if
          replace crChar with cr in thisItem
          replace tabChar with tab in thisItem
          replace commaChar with comma in thisItem
          replace openQuoteChar with quote in thisItem
          replace closeQuoteChar with quote in thisItem
          put thisItem into A[lineCounter][itemCounter]
       end repeat
    end repeat
    return A
end CSVtoArray

function getDelimiters pText, nbr
    -- returns a cr-delimited list of <nbr> characters
    --    not found in the variable pText
    -- use for delimiters for, eg, parsing text files, manipulating arrays, etc.
    -- usage: put getDelimiters(pText,2) into tDelims
    --        if tDelims begins with "Error" then exit to top -- or whatever
    --        put line 1 of tDelims into lineDivider
    --        put line 2 of tDelims into itemDivider
    --             etc.
    -- by Peter M. Brigham, pmb...@gmail.com — freeware
if pText = empty then return "Error: no text specified."
    if nbr = empty then put 1 into nbr -- default 1 delimiter
    put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList
    -- low ASCII values, excluding CR, LF, tab, etc.
    put the number of items of baseList into maxNbr
    if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
    repeat with tCount = 1 to nbr
       put true into failed
       repeat with i = 1 to the number of items of baseList
          put item i of baseList into testNbr
          put numtochar(testNbr) into testChar
          if testChar is not in pText then
             -- found one, store and get next delim
             put false into failed
             put testChar into line tCount of delimList
             exit repeat
          end if
       end repeat
       if failed then
          if tCount = 0 then
             return "Error: cannot get any delimiters."
          else if tCount = 1 then
             return "Error: can only get one delimiter."
          else
             return "Error: can only get" && tCount && "delimiters."
          end if
       end if
       delete item i of baseList
    end repeat
    return delimList
end getDelimiters

function howmany pStr, pContainer, pCaseSens
    -- how many times pStr occurs in pContainer
    -- note that howmany("00","000000") returns 3, not 5
    --     ie,  overlapping matches are not counted
    -- by Peter M. Brigham, pmb...@gmail.com — freeware
if pCaseSens = empty then put false into pCaseSens
    set the casesensitive to pCaseSens
    if pStr is not in pContainer then return 0
    put len(pContainer) into origLength
    replace pStr with char 2 to -1 of pStr in pContainer
    return origLength - len(pContainer)
end howmany


On Oct 17, 2015, at 5:03 AM, Alex Tweedly wrote:

Naturally it must be removed.

But I have a more philosophical issue / question.


TSV (in and of itself) doesn't have any quotes, and so doesn't handle quoted 
CRs or TABs.

Currently, the 'old' version - as in Richard's published article, doesn't handle TAB 
characters enclosed within a quoted cell. The 'new' version does - but only by 
returning the data delimited by <GS> instead of TAB, and leaving enclosed TABs 
alone - a mistake, IMHO.

I believe that what the converter should do is :
- return TSV - i.e. delimited by TABs
- replace quoted CR by <VT> within quoted cells (as it does now)
- replace quoted TABs by <GS> within quoted cells

Any comments or suggestions ?

Thanks
Alex.

On 17/10/2015 02:34, Mike Kerner wrote:
It's safe as long as you remember to remove it at the end of the function

On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <a...@tweedly.net> wrote:

Duh - replying to myself again :-)

It looks as though that's exactly what you do mean - it certainly
generates the problems you described earlier. And my one-line additional
test would (does in my testing) solve it properly - without it, we don't
get a chance to flush "theInsideStringSoFar" to tNuData, with the extra
line we do. And adding it is always safe (AFAICI).

-- Alex.


On 17/10/2015 00:03, Alex Tweedly wrote:

Sorry, Mike, but can you describe what you mean by a "naked" line ?
Is it simply one with no line delimiter after it ?
i.e. could only happen on the very last line of a file of input ?

Could that be solved by a simple test (after the various 'replace'
statements)
     if the last char of pData <> CR then put CR after pData
before the parsing happens ?

-- Alex.


On 16/10/2015 17:19, Mike Kerner wrote:

No, the problem isn't that LC use LF and CR for ascii(10) and ignores
ascii(13).  That's just a personal problem.

The problem, here, is that the csv parser handles a naked line and a
terminated line differently.  If the line is terminated, it parses it one
way, and if it is not, it parses it (incorrectly) a different way, which
makes me wonder if this is the latest version.

On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar <
bobsnei...@iotecdigital.com>
wrote:

But what if the cr or lf or crlf is inside quoted text, meaning it is not
a delimiter? Oh, I'm afraid the deflector shield will be quite
operational
when your friends arrive.

Bob S


On Oct 16, 2015, at 08:04 , Alex Tweedly <a...@tweedly.net> wrote:
Hi Mike,

thanks for that additional info.

I *think* (it's been 3 years) I left them as <GS> (i.e. numtochar(29))

because I had some data including normal TAB characters within the cells
(!!( and thought <GS> was a safer bet - though of course nothing is
completely safe. It's then up to the caller to decide whether to do
"replace numtochar(29) with TAB in ...", or do TAB escaping, or whatever
they want.

As for the other bigger problem .... Oh dear = CR vs LF vs CRLF ....

Are you on Mac or Windows or Linux ?
How is the LF delimited data getting into your app ?
Maybe we should just add a "replace chartonum(13) with CR in pData" ?

(I confess to being confused by this - I know that LC does

auto-translation of line delimiters at various places, but I'm not sure
when it is, or isn't, completely safe. Maybe the easiest thing is to
jst do
all the translations ....

   replace CRLF with CR in pData
   replace numtochar(10) with CR in pData
   replace numtochar(13) with CR in pData

-- Alex.

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to