My mistake, failed to include the offsets() handler: -- Peter
Peter M. Brigham pmb...@gmail.com http://home.comcast.net/~pmbrig ----------- function offsets str, pContainer -- returns a comma-delimited list of all the offsets of str in pContainer -- returns 0 if not found -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5" -- ie, overlapping offsets are not counted -- note: to get the last occurrence of a string in a container (often useful) -- use "item -1 of offsets(...)" -- by Peter M. Brigham, pmb...@gmail.com — freeware if str is not in pContainer then return 0 put 0 into startPoint repeat put offset(str,pContainer,startPoint) into thisOffset if thisOffset = 0 then exit repeat add thisOffset to startPoint put startPoint & comma after offsetList add length(str)-1 to startPoint end repeat return item 1 to -1 of offsetList -- delete trailing comma end offsets On Oct 17, 2015, at 8:30 PM, Alex Tweedly wrote: > Hi Peter, > > it also requires offsets() - I can guess what it does, but it would be safer > to get the actual code you use :-) > > Thanks > -- Alex. > > On 18/10/2015 00:41, Peter M. Brigham wrote: >> So here's my attempt. It converts a CVS text to an array. Let's see if >> there's csv data that can break it. >> >> -- Peter >> >> Peter M. Brigham >> pmb...@gmail.com >> http://home.comcast.net/~pmbrig >> >> ------- >> >> function CSVtoArray pData >> -- by Peter M. Brigham, pmb...@gmail.com >> -- requires getDelimiters(), howmany() >> put getDelimiters(pData,5) into tDelims >> put line 1 of tDelims into crChar >> put line 2 of tDelims into tabChar >> put line 3 of tDelims into commaChar >> put line 4 of tDelims into openQuoteChar >> put line 5 of tDelims into closeQuoteChar >> replace crlf with cr in pData -- Win to UNIX >> replace numtochar(13) with cr in pData -- Mac to UNIX >> if howmany(quote,pData) mod 2 = 1 then >> return "This CSV data is not parsable (unclosed quotes in data)." >> end if >> put offsets(quote,pData) into qOffsets >> if qOffsets > 0 then >> put 1 into counter >> repeat for each item q in qOffsets >> if counter mod 2 = 1 then put openQuoteChar into char q of pData >> else put closeQuoteChar into char q of pData >> add 1 to counter >> end repeat >> end if >> put offsets(cr,pData) into crOffsets >> repeat for each item r in crOffsets >> put char 1 to r of pData into upToHere >> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) >> then >> -- the cr is within a quoted string >> put crChar into char r of pData >> end if >> end repeat >> put offsets(tab,pData) into tabOffsets >> repeat for each item t in tabOffsets >> put char 1 to t of pData into upToHere >> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) >> then >> -- the tab is within a quoted string >> put tabChar into char t of pData >> end if >> end repeat >> put offsets(comma,pData) into commaOffsets >> repeat for each item c in commaOffsets >> put char 1 to c of pData into upToHere >> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) >> then >> -- the comma is within a quoted string >> put commaChar into char c of pData >> end if >> end repeat >> put 0 into lineCounter >> repeat for each line L in pData >> add 1 to lineCounter >> put 0 into itemCounter >> repeat for each item i in L >> add 1 to itemCounter >> put i into thisItem >> if howmany(quote,thisItem) mod 2 = 1 then >> return "This CSV data is not parsable (unclosed quotes in item)." >> end if >> replace crChar with cr in thisItem >> replace tabChar with tab in thisItem >> replace commaChar with comma in thisItem >> replace openQuoteChar with quote in thisItem >> replace closeQuoteChar with quote in thisItem >> put thisItem into A[lineCounter][itemCounter] >> end repeat >> end repeat >> return A >> end CSVtoArray >> >> function getDelimiters pText, nbr >> -- returns a cr-delimited list of <nbr> characters >> -- not found in the variable pText >> -- use for delimiters for, eg, parsing text files, manipulating arrays, >> etc. >> -- usage: put getDelimiters(pText,2) into tDelims >> -- if tDelims begins with "Error" then exit to top -- or whatever >> -- put line 1 of tDelims into lineDivider >> -- put line 2 of tDelims into itemDivider >> -- etc. >> -- by Peter M. Brigham, pmb...@gmail.com — freeware >> if pText = empty then return "Error: no text specified." >> if nbr = empty then put 1 into nbr -- default 1 delimiter >> put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList >> -- low ASCII values, excluding CR, LF, tab, etc. >> put the number of items of baseList into maxNbr >> if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters." >> repeat with tCount = 1 to nbr >> put true into failed >> repeat with i = 1 to the number of items of baseList >> put item i of baseList into testNbr >> put numtochar(testNbr) into testChar >> if testChar is not in pText then >> -- found one, store and get next delim >> put false into failed >> put testChar into line tCount of delimList >> exit repeat >> end if >> end repeat >> if failed then >> if tCount = 0 then >> return "Error: cannot get any delimiters." >> else if tCount = 1 then >> return "Error: can only get one delimiter." >> else >> return "Error: can only get" && tCount && "delimiters." >> end if >> end if >> delete item i of baseList >> end repeat >> return delimList >> end getDelimiters >> >> function howmany pStr, pContainer, pCaseSens >> -- how many times pStr occurs in pContainer >> -- note that howmany("00","000000") returns 3, not 5 >> -- ie, overlapping matches are not counted >> -- by Peter M. Brigham, pmb...@gmail.com — freeware >> if pCaseSens = empty then put false into pCaseSens >> set the casesensitive to pCaseSens >> if pStr is not in pContainer then return 0 >> put len(pContainer) into origLength >> replace pStr with char 2 to -1 of pStr in pContainer >> return origLength - len(pContainer) >> end howmany >> >> >> On Oct 17, 2015, at 5:03 AM, Alex Tweedly wrote: >> >>> Naturally it must be removed. >>> >>> But I have a more philosophical issue / question. >>> >>> >>> TSV (in and of itself) doesn't have any quotes, and so doesn't handle >>> quoted CRs or TABs. >>> >>> Currently, the 'old' version - as in Richard's published article, doesn't >>> handle TAB characters enclosed within a quoted cell. The 'new' version does >>> - but only by returning the data delimited by <GS> instead of TAB, and >>> leaving enclosed TABs alone - a mistake, IMHO. >>> >>> I believe that what the converter should do is : >>> - return TSV - i.e. delimited by TABs >>> - replace quoted CR by <VT> within quoted cells (as it does now) >>> - replace quoted TABs by <GS> within quoted cells >>> >>> Any comments or suggestions ? >>> >>> Thanks >>> Alex. >>> >>> On 17/10/2015 02:34, Mike Kerner wrote: >>>> It's safe as long as you remember to remove it at the end of the function >>>> >>>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <a...@tweedly.net> wrote: >>>> >>>>> Duh - replying to myself again :-) >>>>> >>>>> It looks as though that's exactly what you do mean - it certainly >>>>> generates the problems you described earlier. And my one-line additional >>>>> test would (does in my testing) solve it properly - without it, we don't >>>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra >>>>> line we do. And adding it is always safe (AFAICI). >>>>> >>>>> -- Alex. >>>>> >>>>> >>>>> On 17/10/2015 00:03, Alex Tweedly wrote: >>>>> >>>>>> Sorry, Mike, but can you describe what you mean by a "naked" line ? >>>>>> Is it simply one with no line delimiter after it ? >>>>>> i.e. could only happen on the very last line of a file of input ? >>>>>> >>>>>> Could that be solved by a simple test (after the various 'replace' >>>>>> statements) >>>>>> if the last char of pData <> CR then put CR after pData >>>>>> before the parsing happens ? >>>>>> >>>>>> -- Alex. >>>>>> >>>>>> >>>>>> On 16/10/2015 17:19, Mike Kerner wrote: >>>>>> >>>>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores >>>>>>> ascii(13). That's just a personal problem. >>>>>>> >>>>>>> The problem, here, is that the csv parser handles a naked line and a >>>>>>> terminated line differently. If the line is terminated, it parses it >>>>>>> one >>>>>>> way, and if it is not, it parses it (incorrectly) a different way, which >>>>>>> makes me wonder if this is the latest version. >>>>>>> >>>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar < >>>>>>> bobsnei...@iotecdigital.com> >>>>>>> wrote: >>>>>>> >>>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is >>>>>>> not >>>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite >>>>>>>> operational >>>>>>>> when your friends arrive. >>>>>>>> >>>>>>>> Bob S >>>>>>>> >>>>>>>> >>>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <a...@tweedly.net> wrote: >>>>>>>>> Hi Mike, >>>>>>>>> >>>>>>>>> thanks for that additional info. >>>>>>>>> >>>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e. numtochar(29)) >>>>>>>>> >>>>>>>> because I had some data including normal TAB characters within the >>>>>>>> cells >>>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is >>>>>>>> completely safe. It's then up to the caller to decide whether to do >>>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or >>>>>>>> whatever >>>>>>>> they want. >>>>>>>> >>>>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF .... >>>>>>>>> >>>>>>>>> Are you on Mac or Windows or Linux ? >>>>>>>>> How is the LF delimited data getting into your app ? >>>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData" ? >>>>>>>>> >>>>>>>>> (I confess to being confused by this - I know that LC does >>>>>>>>> >>>>>>>> auto-translation of line delimiters at various places, but I'm not sure >>>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to >>>>>>>> jst do >>>>>>>> all the translations .... >>>>>>>> >>>>>>>>> replace CRLF with CR in pData >>>>>>>>> replace numtochar(10) with CR in pData >>>>>>>>> replace numtochar(13) with CR in pData >>>>>>>>> >>>>>>>>> -- Alex. >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> use-livecode mailing list >>>>>>>> use-livecode@lists.runrev.com >>>>>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>>>>> subscription preferences: >>>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>>>>> >>>>>>>> >>>>>> _______________________________________________ >>>>>> use-livecode mailing list >>>>>> use-livecode@lists.runrev.com >>>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>>> subscription preferences: >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>>> >>>>> _______________________________________________ >>>>> use-livecode mailing list >>>>> use-livecode@lists.runrev.com >>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>> subscription preferences: >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>> >>>> >>> >>> _______________________________________________ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> >> _______________________________________________ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode