I added it to my repository on GitHub if anyone wants to try to do this in Git.
On Sat, Oct 17, 2015 at 10:53 AM, Mike Kerner <mikeker...@roadrunner.com> wrote: > I am going to put 4 on Git and have at it. > > 1) There are other assumptions being made, like assuming that the <VT> and > <GS> don't appear in the incoming text. Instead of hardcoding the interim > substitutions, determine what the interim substitutions are going to be > (can also allow the user to specify them). Characters that we need to deal > with are quote, <HT>,<LF>, and comma. > > 2) In this version, you can specify the incoming column delimiter. Add > the ability for the caller to specify the record delimiter before, the > column and record delimiters after, and what substitutions are going to be > used, after. For example, for embedded <LF>'s, perhaps the user wants <13> > or even a string like a semicolon and a space > > > On Sat, Oct 17, 2015 at 5:03 AM, Alex Tweedly <a...@tweedly.net> wrote: > >> Naturally it must be removed. >> >> But I have a more philosophical issue / question. >> >> >> TSV (in and of itself) doesn't have any quotes, and so doesn't handle >> quoted CRs or TABs. >> >> Currently, the 'old' version - as in Richard's published article, doesn't >> handle TAB characters enclosed within a quoted cell. The 'new' version does >> - but only by returning the data delimited by <GS> instead of TAB, and >> leaving enclosed TABs alone - a mistake, IMHO. >> >> I believe that what the converter should do is : >> - return TSV - i.e. delimited by TABs >> - replace quoted CR by <VT> within quoted cells (as it does now) >> - replace quoted TABs by <GS> within quoted cells >> >> Any comments or suggestions ? >> >> Thanks >> Alex. >> >> >> On 17/10/2015 02:34, Mike Kerner wrote: >> >>> It's safe as long as you remember to remove it at the end of the function >>> >>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <a...@tweedly.net> wrote: >>> >>> Duh - replying to myself again :-) >>>> >>>> It looks as though that's exactly what you do mean - it certainly >>>> generates the problems you described earlier. And my one-line additional >>>> test would (does in my testing) solve it properly - without it, we don't >>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra >>>> line we do. And adding it is always safe (AFAICI). >>>> >>>> -- Alex. >>>> >>>> >>>> On 17/10/2015 00:03, Alex Tweedly wrote: >>>> >>>> Sorry, Mike, but can you describe what you mean by a "naked" line ? >>>>> Is it simply one with no line delimiter after it ? >>>>> i.e. could only happen on the very last line of a file of input ? >>>>> >>>>> Could that be solved by a simple test (after the various 'replace' >>>>> statements) >>>>> if the last char of pData <> CR then put CR after pData >>>>> before the parsing happens ? >>>>> >>>>> -- Alex. >>>>> >>>>> >>>>> On 16/10/2015 17:19, Mike Kerner wrote: >>>>> >>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores >>>>>> ascii(13). That's just a personal problem. >>>>>> >>>>>> The problem, here, is that the csv parser handles a naked line and a >>>>>> terminated line differently. If the line is terminated, it parses it >>>>>> one >>>>>> way, and if it is not, it parses it (incorrectly) a different way, >>>>>> which >>>>>> makes me wonder if this is the latest version. >>>>>> >>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar < >>>>>> bobsnei...@iotecdigital.com> >>>>>> wrote: >>>>>> >>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is >>>>>> not >>>>>> >>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite >>>>>>> operational >>>>>>> when your friends arrive. >>>>>>> >>>>>>> Bob S >>>>>>> >>>>>>> >>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <a...@tweedly.net> wrote: >>>>>>> >>>>>>>> Hi Mike, >>>>>>>> >>>>>>>> thanks for that additional info. >>>>>>>> >>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e. >>>>>>>> numtochar(29)) >>>>>>>> >>>>>>>> because I had some data including normal TAB characters within the >>>>>>> cells >>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is >>>>>>> completely safe. It's then up to the caller to decide whether to do >>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or >>>>>>> whatever >>>>>>> they want. >>>>>>> >>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF .... >>>>>>>> >>>>>>>> Are you on Mac or Windows or Linux ? >>>>>>>> How is the LF delimited data getting into your app ? >>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData" >>>>>>>> ? >>>>>>>> >>>>>>>> (I confess to being confused by this - I know that LC does >>>>>>>> >>>>>>>> auto-translation of line delimiters at various places, but I'm not >>>>>>> sure >>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to >>>>>>> jst do >>>>>>> all the translations .... >>>>>>> >>>>>>> replace CRLF with CR in pData >>>>>>>> replace numtochar(10) with CR in pData >>>>>>>> replace numtochar(13) with CR in pData >>>>>>>> >>>>>>>> -- Alex. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> use-livecode mailing list >>>>>>> use-livecode@lists.runrev.com >>>>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>>>> subscription preferences: >>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>> use-livecode mailing list >>>>> use-livecode@lists.runrev.com >>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>> subscription preferences: >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>> >>>>> >>>> _______________________________________________ >>>> use-livecode mailing list >>>> use-livecode@lists.runrev.com >>>> Please visit this url to subscribe, unsubscribe and manage your >>>> subscription preferences: >>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>> >>>> >>> >>> >> >> _______________________________________________ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode >> > > > > -- > On the first day, God created the heavens and the Earth > On the second day, God created the oceans. > On the third day, God put the animals on hold for a few hours, > and did a little diving. > And God said, "This is good." > -- On the first day, God created the heavens and the Earth On the second day, God created the oceans. On the third day, God put the animals on hold for a few hours, and did a little diving. And God said, "This is good." _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode