I always feel GNU APL kind of slow compared to Dyalog, but I never really compared two in large dataset. I'm mostly using J now for large dataset. If Elias has the optimized code for GNU APL and a reproducible way to measure timing, I'd like to compare it with Dyalog and J.
> On Jan 17, 2017, at 6:48 PM, Blake McBride <blake1...@gmail.com> wrote: > > Rather than jump to adding new quad functions, I'm wondering what the timing > of reading that CSV file is when you optimize the APL code like the few > suggestions made by Juergen. > > Specifically, we all know APL is a dog when it comes to looping and doing one > thing at a time. Reading the whole thing in as a matrix and processing it as > a unit is more APL-ish and would probably have beaten the bad version of the > Lisp code. (Of course reading the whole thing in and processing it as a unit > could end up taking 1GB of RAM with the intermediary stuff.) > > On the other hand, reading CSV and fixed length record files is pretty common > and useful. > > Thanks. > > Blake > > > On Tue, Jan 17, 2017 at 5:01 PM, Juergen Sauermann > <juergen.sauerm...@t-online.de> wrote: > Hi Elias, > > I believe in principle what we want is something like this: > > Z←FOO¨Z←⎕FIO[N] 'filename' > > where ⎕FIO[N] reads 'filename' line by line putting each line j into the > nested item Z[j] > and FOO is a decoding function that translates a line into whatever Z[j] > shall become in the end. > > The current performance problem is then solved by the ¨ operator which > allocates a big enough Z beforehand > and fills it with the result of FOO for each line. > > I can try to make ⎕FIO an operator so that you can use > > Z←FOO ⎕FIO[N] 'filename' > > for the above and I hope that will be syntactically possible. But it looks > almost like +/[N]B with FOO > instead of + and ⎕FIO instead of / which I believe should work somehow. Can > become a little tricky though, > because there are the same ambiguities for ⎕FIO then those for / (function > versus operator). > > /// Jürgen > > > > On 01/17/2017 09:37 PM, Elias Mårtenson wrote: >> On 18 January 2017 at 04:10, Juergen Sauermann >> <juergen.sauerm...@t-online.de> wrote: >> >> What I do not like about ⎕CSV (actually I am only guessing here because I >> dont know what it reallly does, >> but I assume it is specifically for comma separated lists) is that it is >> supposedly only works for comma >> separated lists. If we have something more general which solves the >> performance problem of >> Z⍪ without only working for specific formats like CSV then I would prefer >> that. >> >> You make a good point, and in my envisioned function (being an external >> function, or a built-in one (called ⎕CSV or otherwise)) would accept a >> left-hand argument, being a format definition telling the function how to >> parse the CSV data. >> >> You are absolutely correct in that there are many ways to express CSV data, >> and looking at the flags available in R gives some insight into this. My >> intention is to build something that can at least handle the most important >> of these variations. What the left-hand format definition will look like, I >> have not yet decided, except for one thing: I want to be able to specify a >> function that will be called that can be responsible for parsing a line. >> This way it'll be possible to handle any format that is not natively >> supported. >> >> Regards, >> Elias > >