Re: CSV again.

2015-10-29 Thread Alex Tweedly
I did. And I get test" as expected. I'm obviously missing something here - but let's go off-list until we figure it out Here's my test script on mouseUp local tmp, t1 put quote & "test" & CR & quote & quote & quote &CR into tmp put csvToTab3(tmp) into t1 put t1 & CR after

Re: CSV again.

2015-10-29 Thread Mike Kerner
Try using exactly the string I sent: "test""" I get test", when I think what you intend is test" On Thu, Oct 29, 2015 at 7:25 PM, Alex Tweedly wrote: > > On 29/10/2015 14:41, Mike Kerner wrote: > >> Belay that. Let's do this on the list. >> >> Sure ... > >> On Thu, Oct 29, 2015 at 10:22 AM, Mi

Re: CSV again.

2015-10-29 Thread Alex Tweedly
On 29/10/2015 14:41, Mike Kerner wrote: Belay that. Let's do this on the list. Sure ... On Thu, Oct 29, 2015 at 10:22 AM, Mike Kerner > wrote: 1) In v3, why did you remove the substitution? That just bit me. Short answer : A bug. Long answer : 2 bugs, bu

Re: CSV again.

2015-10-29 Thread Mike Kerner
So beyond the embedded , I found another issue. Let's say the string is "test""" The is not handled. Should you perhaps do your substitutions on the "inside", instead of on the "passedQuote"? -- On the first day, God created the heavens and the Earth On the second day, God created the oceans

Re: CSV again.

2015-10-29 Thread Mike Kerner
Alex, So which version are you proposing as being current? Is there some reason why you removed handling embedded in 3? On Tue, Oct 20, 2015 at 12:36 AM, Kay C Lan wrote: > This topic reminds me of time. If you think CSV is a standard that has no > standard, making it difficult to program arou

Re: CSV again.

2015-10-19 Thread Kay C Lan
This topic reminds me of time. If you think CSV is a standard that has no standard, making it difficult to program around, then don't even bother attempting to work with time. Here's a good summary - make sure you watch to the very end where he discusses the Google approach to one of the vary many

Re: CSV again.

2015-10-19 Thread Alex Tweedly
On 19/10/2015 02:52, Mike Kerner wrote: Well, there goes that idea. There are tutorials right on Git, but it might be easier if you (and anyone else so not-inclined to Git) post here and those of us who are at least inclined to try will make do with doing that work for you. OK, OK, I know I n

Re: CSV again.

2015-10-18 Thread Mike Kerner
Well, there goes that idea. There are tutorials right on Git, but it might be easier if you (and anyone else so not-inclined to Git) post here and those of us who are at least inclined to try will make do with doing that work for you. Anyway, here's what I have as the latest version, with a coupl

Re: CSV again.

2015-10-18 Thread Alex Tweedly
On 18/10/2015 13:57, Mike Kerner wrote: https://github.com/macMikey/LiveCode-Libraries/tree/master/csv I've found some corner cases and made some others. OK, I confess: I've never used git or github, and I have no idea how to get access to these. :-) I know I need to learn, but honestl

Re: CSV again.

2015-10-18 Thread Alex Tweedly
On 18/10/2015 03:17, Peter M. Brigham wrote: At this point, finding a function that does the task at all -- reliably and taking into account most of the csv malformations we can anticipate -- would be a start. So far nothing has been unbreakable. Once we find an algorithm that does the job,

Re: CSV again.

2015-10-18 Thread Mike Kerner
Consider them added. They're called "Richard-1.csv" and "Richard-2.csv" On Sun, Oct 18, 2015 at 6:46 PM, Richard Gaskin wrote: > Mike Kerner wrote: > >> I don't have a corner case file, yet, but I'm going to start adding one to >> Git in a minute... >> >> On Sun, Oct 18, 2015 at 2:26 AM, Kay C

Re: CSV again.

2015-10-18 Thread Richard Gaskin
Mike Kerner wrote: I don't have a corner case file, yet, but I'm going to start adding one to Git in a minute... On Sun, Oct 18, 2015 at 2:26 AM, Kay C Lan wrote: On Sun, Oct 18, 2015 at 10:17 AM, Peter M. Brigham wrote: > At this point, finding a function that does the task at all -- relia

Re: CSV again.

2015-10-18 Thread Mike Kerner
https://github.com/macMikey/LiveCode-Libraries/tree/master/csv I've found some corner cases and made some others. On Sun, Oct 18, 2015 at 8:01 AM, Mike Kerner wrote: > I don't have a corner case file, yet, but I'm going to start adding one to > Git in a minute... > > On Sun, Oct 18, 2015 at 2:2

Re: CSV again.

2015-10-18 Thread Mike Kerner
I don't have a corner case file, yet, but I'm going to start adding one to Git in a minute... On Sun, Oct 18, 2015 at 2:26 AM, Kay C Lan wrote: > On Sun, Oct 18, 2015 at 10:17 AM, Peter M. Brigham > wrote: > > > At this point, finding a function that does the task at all -- reliably > > and tak

Re: CSV again.

2015-10-17 Thread Kay C Lan
On Sun, Oct 18, 2015 at 10:17 AM, Peter M. Brigham wrote: > At this point, finding a function that does the task at all -- reliably > and taking into account most of the csv malformations we can anticipate -- > would be a start. Actually, having a standard mutant csv file to work on would be a

Re: CSV again.

2015-10-17 Thread Mike Kerner
Peter, You're absolutely right, of course. While we're at it, it would be interesting to see what we come up with if we write it for LCB's modules... On Sat, Oct 17, 2015 at 10:17 PM, Peter M. Brigham wrote: > At this point, finding a function that does the task at all -- reliably > and taking

Re: CSV again.

2015-10-17 Thread Peter M. Brigham
At this point, finding a function that does the task at all -- reliably and taking into account most of the csv malformations we can anticipate -- would be a start. So far nothing has been unbreakable. Once we find an algorithm that does the job, we can focus on speeding it up. That said, I don

Re: CSV again.

2015-10-17 Thread Mike Kerner
The other thing that we are going to be interested in is finding the fastest function that performs the task. On Sat, Oct 17, 2015 at 10:04 PM, Mike Kerner wrote: > I think that item is odd. Quotes are, if memory serves, only supposed to > appear if they are double-quoted. Between "f" and "g"

Re: CSV again.

2015-10-17 Thread Mike Kerner
I think that item is odd. Quotes are, if memory serves, only supposed to appear if they are double-quoted. Between "f" and "g" you have three quotes, and between "g" and "h" you only have one. I believe that is not a correct csv format. On Sat, Oct 17, 2015 at 9:24 PM, Peter M. Brigham wrote:

Re: CSV again.

2015-10-17 Thread Peter M. Brigham
On Oct 17, 2015, at 8:47 PM, Alex Tweedly wrote: > Also, I think (i.e. I haven't yet run the code, since I don't have offsets() > available) there is another mis-formed case you don't properly detect : > a,b,c,"def"""g"h",i,j,k if I put this as one of the lines of my CSV data, it gets sorted int

Re: CSV again.

2015-10-17 Thread Peter M. Brigham
Thanks for catching that. Change the if-then structure to: if howmany(openQuoteChar,thisItem) <> howmany(closeQuoteChar,thisItem) then return "This CSV data is not parsable (unclosed quotes in item)." end if Revised function: function CSVtoArray pData -- by Peter M. Brigham, pmb...@gmail.

Re: CSV again.

2015-10-17 Thread Alex Tweedly
Ummm surely at this point repeat for each item i in L add 1 to itemCounter put i into thisItem if howmany(quote,thisItem) mod 2 = 1 then return "This CSV data is not parsable (unclosed quotes in item)." end if ... howmany(quote,th

Re: CSV again.

2015-10-17 Thread Peter M. Brigham
My mistake, failed to include the offsets() handler: -- Peter Peter M. Brigham pmb...@gmail.com http://home.comcast.net/~pmbrig --- function offsets str, pContainer -- returns a comma-delimited list of all the offsets of str in pContainer -- returns 0 if not found -- note: offs

Re: CSV again.

2015-10-17 Thread Alex Tweedly
Hi Peter, it also requires offsets() - I can guess what it does, but it would be safer to get the actual code you use :-) Thanks -- Alex. On 18/10/2015 00:41, Peter M. Brigham wrote: So here's my attempt. It converts a CVS text to an array. Let's see if there's csv data that can break it.

Re: CSV again.

2015-10-17 Thread Peter M. Brigham
So here's my attempt. It converts a CVS text to an array. Let's see if there's csv data that can break it. -- Peter Peter M. Brigham pmb...@gmail.com http://home.comcast.net/~pmbrig --- function CSVtoArray pData -- by Peter M. Brigham, pmb...@gmail.com -- requires getDelimiters(), ho

Re: CSV again.

2015-10-17 Thread Mike Kerner
I added it to my repository on GitHub if anyone wants to try to do this in Git. On Sat, Oct 17, 2015 at 10:53 AM, Mike Kerner wrote: > I am going to put 4 on Git and have at it. > > 1) There are other assumptions being made, like assuming that the and > don't appear in the incoming text. Inst

Re: CSV again.

2015-10-17 Thread Mike Kerner
I am going to put 4 on Git and have at it. 1) There are other assumptions being made, like assuming that the and don't appear in the incoming text. Instead of hardcoding the interim substitutions, determine what the interim substitutions are going to be (can also allow the user to specify them)

Re: CSV again.

2015-10-17 Thread Alex Tweedly
Naturally it must be removed. But I have a more philosophical issue / question. TSV (in and of itself) doesn't have any quotes, and so doesn't handle quoted CRs or TABs. Currently, the 'old' version - as in Richard's published article, doesn't handle TAB characters enclosed within a quoted

Re: CSV again.

2015-10-16 Thread Mike Kerner
It's safe as long as you remember to remove it at the end of the function On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly wrote: > Duh - replying to myself again :-) > > It looks as though that's exactly what you do mean - it certainly > generates the problems you described earlier. And my one-lin

Re: CSV again.

2015-10-16 Thread Alex Tweedly
Duh - replying to myself again :-) It looks as though that's exactly what you do mean - it certainly generates the problems you described earlier. And my one-line additional test would (does in my testing) solve it properly - without it, we don't get a chance to flush "theInsideStringSoFar" to

Re: CSV again.

2015-10-16 Thread Alex Tweedly
Sorry, Mike, but can you describe what you mean by a "naked" line ? Is it simply one with no line delimiter after it ? i.e. could only happen on the very last line of a file of input ? Could that be solved by a simple test (after the various 'replace' statements) if the last char of pData <

Re: CSV again.

2015-10-16 Thread Alex Tweedly
It's likely (but of course not 100% guaranteed) that those characters have themselves been manipulated in a consistent way by either LC or any other subsystem - i.e. auto-translated or not. Anyone who chooses to use those as genuinely different characters within quoted cells *deserves* to have

Re: CSV again.

2015-10-16 Thread Bob Sneidar
The force is strong with this one. Bob S On Oct 16, 2015, at 09:19 , Mike Kerner mailto:mikeker...@roadrunner.com>> wrote: No, the problem isn't that LC use LF and CR for ascii(10) and ignores ascii(13). That's just a personal problem. The problem, here, is that the csv parser handles a nake

Re: CSV again.

2015-10-16 Thread Mike Kerner
No, the problem isn't that LC use LF and CR for ascii(10) and ignores ascii(13). That's just a personal problem. The problem, here, is that the csv parser handles a naked line and a terminated line differently. If the line is terminated, it parses it one way, and if it is not, it parses it (inco

Re: CSV again.

2015-10-16 Thread Bob Sneidar
But what if the cr or lf or crlf is inside quoted text, meaning it is not a delimiter? Oh, I'm afraid the deflector shield will be quite operational when your friends arrive. Bob S > On Oct 16, 2015, at 08:04 , Alex Tweedly wrote: > > Hi Mike, > > thanks for that additional info. > > I *th

Re: CSV again.

2015-10-16 Thread Alex Tweedly
Hi Mike, thanks for that additional info. I *think* (it's been 3 years) I left them as (i.e. numtochar(29)) because I had some data including normal TAB characters within the cells (!!( and thought was a safer bet - though of course nothing is completely safe. It's then up to the caller to

Re: CSV again.

2015-10-16 Thread Bob Sneidar
Someone wrote a piece years ago about why no one who wanted to maintain his sanity should attempt to write an XML to CSV parser. In the process of writing the piece, his mind degenerated until he was blathering on about non-sensical things. The devil had finished his work on the poor soul. I do

Re: CSV again.

2015-10-16 Thread Mike Kerner
Richard, Yes, I understand it was a Pascal Pun, and then in 2012, when this thread originally happened, it became something more, sort of a version pun on a pascal pun, if you will. Rather than posting fixes to the one on your blog, let's go through the "state of the art" and work on that, instead

Re: CSV again.

2015-10-15 Thread Alex Tweedly
H ... my quick test of what was csv4Tab, but is now called csvToTab1 - see below - gives me (showing results with a colon ':' for the cell delimiter, i.e. replacing numtochar(29) from the code in the previous use-list code a,b,c ---> a:b:c "a","","c" ---> a::c Now to me, that's what it

Re: CSV again.

2015-10-15 Thread Richard Gaskin
Mike Kerner wrote: For everyone trying to get back up to speed on CSV, here's the closest thing to a "Standard", RFC 4180: https://tools.ietf.org/html/rfc4180 Unfortunately the "format" was around for so long before that RFC, and so many big companies have ignored the RFC since, that it doesn'

Re: CSV again.

2015-10-15 Thread Richard Gaskin
Mike Kerner wrote: csv4 does not handle it, and it comes up with a different result from csv2 (which is also wrong). I sent Richard proposed changes to csv2 which addresses that issue, but I'll wait while we collectively try to remember what the latest and greatest csv parser algorithm is befor

Re: CSV again.

2015-10-15 Thread Mike Kerner
For everyone trying to get back up to speed on CSV, here's the closest thing to a "Standard", RFC 4180: https://tools.ietf.org/html/rfc4180 On Thu, Oct 15, 2015 at 8:34 PM, Peter Haworth wrote: > Right I remember that although not what the exact problem was. In any > case, csv4Tab has been work

Re: CSV again.

2015-10-15 Thread Mike Kerner
csv4 does not handle it, and it comes up with a different result from csv2 (which is also wrong). I sent Richard proposed changes to csv2 which addresses that issue, but I'll wait while we collectively try to remember what the latest and greatest csv parser algorithm is before I try to come up wit

Re: CSV again.

2015-10-15 Thread Peter Haworth
Right I remember that although not what the exact problem was. In any case, csv4Tab has been working fine in my SQLiteAdmin program for at least a couple of years now, but I have no idea what flavor of csv files have been imported. Pete lcSQL Software Home of lcStackBrowser

Re: CSV again.

2015-10-15 Thread Alex Tweedly
Richard et al., sometime after that article, there was a further thread on the use-list. Pete Haworth found a case not properly covered by the version on the article, and I came up with a revised version (cutely called csv4Tab !! - csv3Tab was an interim, deeply buggy attempt) (It's in http

Re: CSV again.

2015-10-15 Thread Tim Selander
So, tell us what you really think about .CSV, Richard! :-) Tim Selander Tokyo, Japan On 15/10/16 8:34, Richard Gaskin wrote: stupidly complex really stupid stupid format really dumb idea ___ use-livecode mailing list use-livecode@lists.runrev.

Re: CSV again.

2015-10-15 Thread Richard Gaskin
Mike Kerner wrote: > Alex, Richard, etc. > > What do we consider the latest version of the csv parser? I think I > found a bug in Richard's CSV2Text code, and proposed changes, but he > wanted the discussion to go down over here, first. Then I noticed > that csv4Text is out over here, which make

Re: CSV again.

2015-10-15 Thread Mike Kerner
Alex, Richard, etc. What do we consider the latest version of the csv parser? I think I found a bug in Richard's CSV2Text code, and proposed changes, but he wanted the discussion to go down over here, first. Then I noticed that csv4Text is out over here, which makes 2, I guess, a bit long in the

Re: CSV again.

2012-05-16 Thread Alex Tweedly
On 16/05/2012 00:35, Peter Haworth wrote: Thanks Alex. I ran the same data though your new handler and it seems to have worked fine. There was a recent discussion on some of these corner case issues on the sqlite list so I'll go grab their test cases and see what happens. As far as performance

Re: CSV again.

2012-05-15 Thread Peter Haworth
Thanks Alex. I ran the same data though your new handler and it seems to have worked fine. There was a recent discussion on some of these corner case issues on the sqlite list so I'll go grab their test cases and see what happens. As far as performance, the new handler took approx 2 1/2 times lo

Re: CSV again.

2012-05-15 Thread Bob Sneidar
hmmm... How are the hotels? Bob On May 15, 2012, at 3:54 PM, Alex Tweedly wrote: > On 15/05/2012 18:26, Bob Sneidar wrote: >> Another good developer lost to the csv parsing chasm of hell. We >> won't be hearing from Alex again. ;-) >> > Don't worry Bob, I'm just a tourist here in the chasm,

Re: CSV again.

2012-05-15 Thread Alex Tweedly
On 15/05/2012 18:26, Bob Sneidar wrote: Another good developer lost to the csv parsing chasm of hell. We won't be hearing from Alex again. ;-) Don't worry Bob, I'm just a tourist here in the chasm, I'm not moving in :-) Pete - please try this out on your data. AFAICT it should handle all th

Re: CSV again.

2012-05-15 Thread Peter Haworth
Thanks for everyone's kind thoughts in this time of turmoil. I wish I had a choice but I don't so I'll just keep on bearing the csv cross of shame. Pete lcSQL Software On Tue, May 15, 2012 at 12:41 PM, Mark Wieder wrote: > Bob- > > Tuesday, May 15, 2012, 10:26:41 AM, yo

Re: CSV again.

2012-05-15 Thread Mark Wieder
Bob- Tuesday, May 15, 2012, 10:26:41 AM, you wrote: > Another good developer lost to the csv parsing chasm of > hell. We won't be hearing from Alex again. ;-) Alas, I fear Pete is following down that lonesome road. It's too bad, they were such nice members of the community - I'll quite miss the

Re: CSV again.

2012-05-15 Thread Bob Sneidar
That is a perfect case example of why CSV parsers can never be perfect. Unescaped delimiters in field contents should never have been allowed when they came up with the "standard" for CSV files. Bob On May 15, 2012, at 10:51 AM, Peter Haworth wrote: > I'll probably have to implement some sor

Re: CSV again.

2012-05-15 Thread Peter Haworth
Thanks Alex, all good points. I'm still trying to figure out why the program that created the csv file used this problematic string since it only happened for one cell - all other empty cells simply had two consecutive commas. Nevertheless, the other cases you cited are definitely valid so I guess

Re: CSV again.

2012-05-15 Thread Bob Sneidar
Another good developer lost to the csv parsing chasm of hell. We won't be hearing from Alex again. ;-) Bob On May 15, 2012, at 10:02 AM, Alex Tweedly wrote: > Unfortunately, that's not enough to fix it, Peter. > > The problem case you have identified is where the CSV exporter has decided to

Re: CSV again.

2012-05-15 Thread Alex Tweedly
Unfortunately, that's not enough to fix it, Peter. The problem case you have identified is where the CSV exporter has decided to quote even empty cells. This wasn't covered in the original samples, or in any cases I've had to deal with. Your workaround uses the sequence to attempt to identi

Re: CSV again.

2012-05-14 Thread Peter Haworth
Hi Alex, Just toi clat=rify, this was two double quotes with a comma right before and right after them, not an escaped double quote in the middle of string. I've made a fix to this which works, subject to your approval I changed the line: *replace* quote"e with tEscapedQuotePlaceholder in pData

Re: CSV again.

2012-05-14 Thread Alex Tweedly
Yeah, the "training empty item" problem has been much discussed, and there are good reasons for keeping it as it is (even apart from the need to not break existing code). In similar circumstances, I've done replace (comma & CR) with (comma & space & CR) in tVariable but in your case, even

Re: CSV again.

2012-05-14 Thread Bob Sneidar
This has been discussed before. The last delimiter is not considered when parsing lines, items and words. The contents of the "object" are the only thing that is considered, so if nothing comes after the last delimiter, LC says, "Nothing to see here. Moving along..." That being said, "item1,,it

Re: CSV again.

2012-05-14 Thread Peter Haworth
I've just been checking out Alex's new csv parser and it is indeed much faster than the original, closer to 50% than 40% in my test case. However, I've also run into a Livecode issue while doing all this. This has come up before in the context of what LC thinks is a line, there's a similar issue/

Re: CSV again.

2012-05-07 Thread Peter Haworth
Thanks for this Alex! For list members, I am indebted to Alex for his original csv parsing code which I used, with his permission, in my SQLiteAdmin application. I will check out this code and see how it compares to the code currently embedded in SQLiteAdmin. Pete lcSQL Software

CSV again.

2012-05-07 Thread Alex Tweedly
Some years ago, this list discussed the difficulties of parsing comma-separated-value file format; Richard Gaskin has a great article about it at http://www.fourthworld.com/embassy/articles/csv-must-die.html Following that discussion, I came up with some code to parse CSV in Livecode which was