Hi Mike,

thanks for that additional info.

I *think* (it's been 3 years) I left them as <GS> (i.e. numtochar(29)) because I had some data including normal TAB characters within the cells (!!( and thought <GS> was a safer bet - though of course nothing is completely safe. It's then up to the caller to decide whether to do "replace numtochar(29) with TAB in ...", or do TAB escaping, or whatever they want.

As for the other bigger problem ....   Oh dear = CR vs LF vs CRLF ....

Are you on Mac or Windows or Linux ?
How is the LF delimited data getting into your app ?
Maybe we should just add a "replace chartonum(13) with CR in pData" ?

(I confess to being confused by this - I know that LC does auto-translation of line delimiters at various places, but I'm not sure when it is, or isn't, completely safe. Maybe the easiest thing is to jst do all the translations ....

  replace CRLF with CR in pData
  replace numtochar(10) with CR in pData
  replace numtochar(13) with CR in pData

-- Alex.

On 16/10/2015 12:48, Mike Kerner wrote:
Richard,
Yes, I understand it was a Pascal Pun, and then in 2012, when this thread
originally happened, it became something more, sort of a version pun on a
pascal pun, if you will.

Rather than posting fixes to the one on your blog, let's go through the
"state of the art" and work on that, instead, if it needs it.


Alex,
I see at least two issues with this version:
First of all, you never substitute tab for tNuDelim, so the string you
return is numtochar(29) delimited, not tab-delimited.
The last line of your function, before the "return tNuData" line should be
"replace tNuDelim with tab"

Second of all, I get two different results in my sample, depending on
whether or not the string is <CR>...ERRRRRRRRRRR <LF>-terminated or not
After fixing the problem, above,

When I run
"A","","C"
I get
A <HT> <HT>
i.e. the "C" is missing

NOW, if I send
"A","","C"<LF>
A <HT> <HT> C <LF>

I haven't looked for that bug, yet.

On Thu, Oct 15, 2015 at 10:55 PM, Alex Tweedly <a...@tweedly.net> wrote:

Hmmmm ... my quick test of what was csv4Tab, but is now called csvToTab1 -
see below - gives me
(showing results with a colon  ':' for the cell delimiter, i.e. replacing
numtochar(29) from the code in the previous use-list code

a,b,c   ---> a:b:c
"a","","c" ---> a::c

Now to me, that's what it should give - so I think it gets it right :-)

Question is
a. do you get the same result ?
     if not, what do you get ?  OR can you try with the code below
     if you do, but disagree that this is right, what do you think it
should give ?

-- Alex

function CSVToTab1 pData,pcoldelim
    local tNuData -- contains tabbed copy of data
    local tReturnPlaceholder -- replaces cr in field data to avoid line
    --                       breaks which would be misread as records;
    local tNuDelim  -- new character to replace the delimiter
    local tStatus, theInsideStringSoFar
    --
    put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder
    put numtochar(29) into tNuDelim
    --
    if pcoldelim is empty then put comma into pcoldelim
    -- Normalize line endings:
    replace crlf with cr in pData          -- Win to UNIX
    replace numtochar(13) with cr in pData -- Mac to UNIX

    put "outside" into tStatus
    set the itemdel to quote
    repeat for each item k in pData
       -- put tStatus && k & CR after msg
       switch tStatus

          case "inside"
             put k after theInsideStringSoFar
             put "passedquote" into tStatus
             next repeat

          case "passedquote"
             -- decide if it was a duplicated escapedQuote or a closing
quote
             if k is empty then   -- it's a duplicated quote
                put quote after theInsideStringSoFar
                put "inside" into tStatus
                next repeat
             end if
             -- not empty - so we remain inside the cell, though we have
left the quoted section
             -- NB this allows for quoted sub-strings within the cell
content !!
             replace cr with tReturnPlaceholder in theInsideStringSoFar
             put theInsideStringSoFar after tNuData

          case "outside"
             replace pcoldelim with tNuDelim in k
             -- and deal with the "empty trailing item" issue in Livecode
             replace (tNuDelim & CR) with tNuDelim & tNuDelim & CR in k
             put k after tNuData
             put "inside" into tStatus
             put empty into theInsideStringSoFar
             next repeat
          default
             put "defaulted"
             break
       end switch
    end repeat
    return tNuData
end CSVToTab1


On 16/10/2015 01:34, Mike Kerner wrote:

csv4 does not handle it, and it comes up with a different result from csv2
(which is also wrong).  I sent Richard proposed changes to csv2 which
addresses that issue, but I'll wait while we collectively try to remember
what the latest and greatest csv parser algorithm is before I try to come
up with more ways to break or fix it.

On Thu, Oct 15, 2015 at 8:24 PM, Alex Tweedly <a...@tweedly.net> wrote:

Richard et al.,
sometime after that article, there was a further thread on the use-list.
Pete Haworth found a case not properly covered by the version on the
article, and I came up with a revised version (cutely called csv4Tab !! -
csv3Tab was an interim, deeply buggy attempt)

(It's in
http://lists.runrev.com/pipermail/use-livecode/2012-May/172275.html )

It *looks* from that thread (
http://lists.runrev.com/pipermail/use-livecode/2012-May/172191.html ) as
though this case had been discussed, and the re-write should properly
handle it - but I haven't yet had time to try it. My laptop has been
replaced in the meantime, and I can't find my test stack, and recreating
it
and finding the test data is a bit too much for after 1am:-)

So I'll try it tomorrow; hopefully csv4Tab() will already work for this
case. If it doesn't, we can try again :-)

-- Alex.


On 16/10/2015 00:34, Richard Gaskin wrote:

Mike Kerner wrote:
Alex, Richard, etc.

What do we consider the latest version of the csv parser?  I think I
found a bug in Richard's CSV2Text code, and proposed changes, but he
wanted the discussion to go down over here, first.  Then I noticed
that csv4Text is out over here, which makes 2, I guess, a bit long in
the tooth.

The version referred to here as "Richard's" is the famous Tweedly algo,
in the middle of this page:
<http://www.fourthworld.com/embassy/articles/csv-must-die.html>

Alex came up with that after a a bunch of us here had a long discussion
about the many variants of CSV running around, and how stupidly complex
they are to parse (see the details in that article).

Mike wrote me this afternoon letting me know that there's yet another
exception that doesn't seem to be accounted for there:

     "value","","value"

I had thought we'd covered that in the earlier discussion, but perhaps
not.

So this seems like a good time to once again bring together the best
minds in our community (are you listening Alex Tweedly, Geoff Canyon,
Mark
Weider, Dick Kreisel, and others?) to see if we can revisit CSV parsing
and
come up with a function that can parse it into tabs efficiently, while
taking into account all of the really stupid exceptions that have crept
into the world since that really stupid format was first popularized.

When we're done I'll update the article, and add even more sarcastic
comments about what a really dumb idea it was to have encouraged people
to
delimit text with a character so frequently appearing in text.

--
   Richard Gaskin
   Fourth World Systems
   Software Design and Development for the Desktop, Mobile, and the Web
   ____________________________________________________________________
   ambassa...@fourthworld.com http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode





_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to