Re: [Bug-apl] Performance problems when constructing large(ish) arrays

Elias Mårtenson Wed, 18 Jan 2017 10:01:48 -0800

Thanks. That worked. It was fast enough to not affect my timings that I
gave above.


Regards,
Elias

On 19 January 2017 at 01:05, Juergen Sauermann <
juergen.sauerm...@t-online.de> wrote:

> Hi Elias,
>
> I believe you need to disclose (⊃) the outer vector:
>
>       Z←(1 2 3) (4 5 6) (7 8 9)
>       Z
>  1 2 3  4 5 6  7 8 9
>       ⊃Z
> 1 2 3
> 4 5 6
> 7 8 9
>
> /// Jürgen
>
>
> On 01/18/2017 05:52 PM, Elias Mårtenson wrote:
>
> Thanks, I changed my function to use the new FIO 49, and the resuolt is
> much more compact:
>
> ∇Z ← pattern read_csv_n filename
>   Z← {pattern convert_entry¨ (⍵≠' ') ⊂ ⍵}¨ ⎕FIO[49] filename
> ∇
>
> It's a bit faster too, as this version runs in 11 seconds.
>
> However, the result is not entirely correct, as this version creates a
> 1-dimensional array where each element is an array consisting of the values
> for one row.
>
> Is there some way I can use EACH to map over the elemts and generate a
> two-dimensional array?
>
> Regards,
> Elias
>
> On 18 January 2017 at 21:46, Juergen Sauermann <
> juergen.sauerm...@t-online.de> wrote:
>
>> Hi,
>>
>> as a start I have added *⎕FIO**[49] *in *SVN 851*. It reads an entire
>> UTF8 encoded file and puts every line of the
>> file into one nested Item of the result. Trailing CR and LF are being
>> removed in the precess.
>>
>> Next step is to turn *⎕FIO[49]* into an operator so that you can give it
>> an APL function that converts every line into the
>> desired result. Until then you can use it like:
>>
>> *Z←CONVERT¨Z←⎕FIO**[49] 'filename'*
>>
>> /// Jürgen
>>
>>
>> On 01/18/2017 11:17 AM, Elias Mårtenson wrote:
>>
>> You've all made good points, and I changed the code slightly to provide
>> the initial array side in order to avoid the recreation of the array on
>> each iteration. This brought down the loading time to a much more bearable 
>> *14
>> seconds*. I rewrote the Lisp code to be compatible with the APL code and
>> the time was *1.46 seconds*. This suggests that GNU APL is consistently
>> about 10 times slower than non-optimised Lisp code. To me, this is not
>> unexpected given the fact that GNU APL isn't designed to be
>> high-performance.
>>
>> However, while 14 seconds for 30k is manageable, I have had the need to
>> work with arrays of over a million rows. Extrapolating this suggests that
>> it would take almost 8 minutes to load such a file. Thus, unless GNU APL
>> can magically improve overall performance by at least 10 times, I still
>> think we need a native CSV loading function.
>>
>> Regards,
>> Elias
>>
>> For reference, here is the APL code:
>>
>> ∇Z ← type convert_entry value
>>   →('n'≡type)/numeric
>>   →('s'≡type)/string
>>   ⎕ES 'Illegal conversion type'
>> numeric:
>>   Z←⍎value
>>   →end
>> string:
>>   Z←value
>> end:
>> ∇
>>
>> ∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i
>>   separator ← ' '
>>   Z ← n (↑⍴pattern) ⍴ 0
>>   fd ← 'r' FIO∆fopen filename
>>   i ← ⎕IO
>>
>> next:
>>   line ← FIO∆fgets fd           ⍝ Read one line from the file
>>   →(⍬≡line)/end
>>   →(10≠line[⍴line])/skip_nl     ⍝ If the line ends in a newline
>>   line ← line[⍳¯1+⍴line]        ⍝ Remove the newline
>> skip_nl:
>>   line ← ⎕UCS line
>>   Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line
>>   i ← i+1
>>   →next
>> end:
>>
>>   FIO∆fclose fd
>> ∇
>>
>> And here is the Lisp code (the test case was running on SBCL), requires
>> the QL packages SPLIT-SEQUENCE and PARSE-NUMBER:
>>
>> (defparameter *result*
>>            (time
>>             (with-open-file (s "apjs492452t1_mrt.txt")
>>               (let ((res (make-array '(34030 11))))
>>                 (dotimes (i (array-dimension res 0))
>>                   (let* ((line (read-line s))
>>                          (parts (split-sequence:split-sequence #\Space
>> line :remove-empty-subseqs t)))
>>                     (loop
>>                       for ii from 0 below 10
>>                       for p in parts
>>                       do (setf (aref res i ii) (parse-number:parse-number
>> p)))
>>                     (setf (aref res i 10) (nth 10 parts))))
>>                 res))))
>>
>> On 18 January 2017 at 09:57, Blake McBride <blake1...@gmail.com> wrote:
>>
>>> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <jinxiaoy...@gmail.com>
>>> wrote:
>>>
>>>> I always feel GNU APL kind of slow compared to Dyalog, but I never
>>>> really compared two in large dataset.
>>>> I'm mostly using J now for large dataset.
>>>> If Elias has the optimized code for GNU APL and a reproducible way to
>>>> measure timing, I'd like to compare it with Dyalog and J.
>>>
>>>
>>> I think that's actually a good idea.  It would be a good comparison.  It
>>> would really make it clear if there is a blaring problem.  But first the
>>> APL code should be optimized a bit (but nothing crazy like reading it all
>>> into memory right now.)
>>>
>>> --blake
>>>
>>>
>>>
>>>
>>
>>
>>
>
>

Re: [Bug-apl] Performance problems when constructing large(ish) arrays

Reply via email to