Re: [Bug-apl] Performance problems when constructing large(ish) arrays

Xiao-Yong Jin Tue, 17 Jan 2017 17:40:30 -0800

I always feel GNU APL kind of slow compared to Dyalog, but I never really 
compared two in large dataset.
I'm mostly using J now for large dataset.
If Elias has the optimized code for GNU APL and a reproducible way to measure 
timing, I'd like to compare it with Dyalog and J.


> On Jan 17, 2017, at 6:48 PM, Blake McBride <blake1...@gmail.com> wrote:
> 
> Rather than jump to adding new quad functions, I'm wondering what the timing 
> of reading that CSV file is when you optimize the APL code like the few 
> suggestions made by Juergen.
> 
> Specifically, we all know APL is a dog when it comes to looping and doing one 
> thing at a time.  Reading the whole thing in as a matrix and processing it as 
> a unit is more APL-ish and would probably have beaten the bad version of the 
> Lisp code.  (Of course reading the whole thing in and processing it as a unit 
> could end up taking 1GB of RAM with the intermediary stuff.)
> 
> On the other hand, reading CSV and fixed length record files is pretty common 
> and useful.
> 
> Thanks.
> 
> Blake
> 
> 
> On Tue, Jan 17, 2017 at 5:01 PM, Juergen Sauermann 
> <juergen.sauerm...@t-online.de> wrote:
> Hi Elias,
> 
> I believe in principle what we want is something like this:
> 
> Z←FOO¨Z←⎕FIO[N] 'filename'
> 
> where ⎕FIO[N] reads 'filename' line by line putting each line j into the 
> nested item Z[j]
> and FOO is a decoding function that translates a line into whatever Z[j] 
> shall become in the end.
> 
> The current performance problem is then solved by the ¨ operator which 
> allocates a big enough Z beforehand
> and fills it with the result of FOO for each line.
> 
> I can try to make ⎕FIO an operator so that you can use
> 
> Z←FOO ⎕FIO[N] 'filename'
> 
> for the above and I hope that will be syntactically possible. But it looks 
> almost like +/[N]B with FOO
> instead of + and ⎕FIO instead of / which I believe should work somehow. Can 
> become a little tricky though,
> because there are the same ambiguities for ⎕FIO then those for / (function 
> versus operator).
> 
> /// Jürgen
> 
> 
> 
> On 01/17/2017 09:37 PM, Elias Mårtenson wrote:
>> On 18 January 2017 at 04:10, Juergen Sauermann 
>> <juergen.sauerm...@t-online.de> wrote:
>>  
>> What I do not like about ⎕CSV (actually I am only guessing here because I 
>> dont know what it reallly does,
>> but I assume it is specifically for comma separated lists) is that it is 
>> supposedly only works for comma
>> separated lists. If we have something more general which solves the 
>> performance problem of
>> Z⍪ without only working for specific formats like CSV then I would prefer 
>> that.
>> 
>> You make a good point, and in my envisioned function (being an external 
>> function, or a built-in one (called ⎕CSV or otherwise)) would accept a 
>> left-hand argument, being a format definition telling the function how to 
>> parse the CSV data.
>> 
>> You are absolutely correct in that there are many ways to express CSV data, 
>> and looking at the flags available in R gives some insight into this. My 
>> intention is to build something that can at least handle the most important 
>> of these variations. What the left-hand format definition will look like, I 
>> have not yet decided, except for one thing: I want to be able to specify a 
>> function that will be called that can be responsible for parsing a line. 
>> This way it'll be possible to handle any format that is not natively 
>> supported.
>> 
>> Regards,
>> Elias
> 
>

Re: [Bug-apl] Performance problems when constructing large(ish) arrays

Reply via email to