Re: [Bug-apl] Performance problems when constructing large(ish) arrays

Elias Mårtenson Wed, 18 Jan 2017 09:12:28 -0800

Thanks, I changed my function to use the new FIO 49, and the resuolt is
much more compact:


∇Z ← pattern read_csv_n filename
  Z← {pattern convert_entry¨ (⍵≠' ') ⊂ ⍵}¨ ⎕FIO[49] filename
∇

It's a bit faster too, as this version runs in 11 seconds.

However, the result is not entirely correct, as this version creates a
1-dimensional array where each element is an array consisting of the values
for one row.

Is there some way I can use EACH to map over the elemts and generate a
two-dimensional array?

Regards,
Elias

On 18 January 2017 at 21:46, Juergen Sauermann <
juergen.sauerm...@t-online.de> wrote:

> Hi,
>
> as a start I have added *⎕FIO**[49] *in *SVN 851*. It reads an entire
> UTF8 encoded file and puts every line of the
> file into one nested Item of the result. Trailing CR and LF are being
> removed in the precess.
>
> Next step is to turn *⎕FIO[49]* into an operator so that you can give it
> an APL function that converts every line into the
> desired result. Until then you can use it like:
>
> *Z←CONVERT¨Z←⎕FIO**[49] 'filename'*
>
> /// Jürgen
>
>
> On 01/18/2017 11:17 AM, Elias Mårtenson wrote:
>
> You've all made good points, and I changed the code slightly to provide
> the initial array side in order to avoid the recreation of the array on
> each iteration. This brought down the loading time to a much more bearable *14
> seconds*. I rewrote the Lisp code to be compatible with the APL code and
> the time was *1.46 seconds*. This suggests that GNU APL is consistently
> about 10 times slower than non-optimised Lisp code. To me, this is not
> unexpected given the fact that GNU APL isn't designed to be
> high-performance.
>
> However, while 14 seconds for 30k is manageable, I have had the need to
> work with arrays of over a million rows. Extrapolating this suggests that
> it would take almost 8 minutes to load such a file. Thus, unless GNU APL
> can magically improve overall performance by at least 10 times, I still
> think we need a native CSV loading function.
>
> Regards,
> Elias
>
> For reference, here is the APL code:
>
> ∇Z ← type convert_entry value
>   →('n'≡type)/numeric
>   →('s'≡type)/string
>   ⎕ES 'Illegal conversion type'
> numeric:
>   Z←⍎value
>   →end
> string:
>   Z←value
> end:
> ∇
>
> ∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i
>   separator ← ' '
>   Z ← n (↑⍴pattern) ⍴ 0
>   fd ← 'r' FIO∆fopen filename
>   i ← ⎕IO
>
> next:
>   line ← FIO∆fgets fd           ⍝ Read one line from the file
>   →(⍬≡line)/end
>   →(10≠line[⍴line])/skip_nl     ⍝ If the line ends in a newline
>   line ← line[⍳¯1+⍴line]        ⍝ Remove the newline
> skip_nl:
>   line ← ⎕UCS line
>   Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line
>   i ← i+1
>   →next
> end:
>
>   FIO∆fclose fd
> ∇
>
> And here is the Lisp code (the test case was running on SBCL), requires
> the QL packages SPLIT-SEQUENCE and PARSE-NUMBER:
>
> (defparameter *result*
>            (time
>             (with-open-file (s "apjs492452t1_mrt.txt")
>               (let ((res (make-array '(34030 11))))
>                 (dotimes (i (array-dimension res 0))
>                   (let* ((line (read-line s))
>                          (parts (split-sequence:split-sequence #\Space
> line :remove-empty-subseqs t)))
>                     (loop
>                       for ii from 0 below 10
>                       for p in parts
>                       do (setf (aref res i ii) (parse-number:parse-number
> p)))
>                     (setf (aref res i 10) (nth 10 parts))))
>                 res))))
>
> On 18 January 2017 at 09:57, Blake McBride <blake1...@gmail.com> wrote:
>
>> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <jinxiaoy...@gmail.com>
>> wrote:
>>
>>> I always feel GNU APL kind of slow compared to Dyalog, but I never
>>> really compared two in large dataset.
>>> I'm mostly using J now for large dataset.
>>> If Elias has the optimized code for GNU APL and a reproducible way to
>>> measure timing, I'd like to compare it with Dyalog and J.
>>
>>
>> I think that's actually a good idea.  It would be a good comparison.  It
>> would really make it clear if there is a blaring problem.  But first the
>> APL code should be optimized a bit (but nothing crazy like reading it all
>> into memory right now.)
>>
>> --blake
>>
>>
>>
>>
>
>
>

Re: [Bug-apl] Performance problems when constructing large(ish) arrays

Reply via email to