You've all made good points, and I changed the code slightly to provide the
initial array side in order to avoid the recreation of the array on each
iteration. This brought down the loading time to a much more bearable *14
seconds*. I rewrote the Lisp code to be compatible with the APL code and
the time was *1.46 seconds*. This suggests that GNU APL is consistently
about 10 times slower than non-optimised Lisp code. To me, this is not
unexpected given the fact that GNU APL isn't designed to be
high-performance.

However, while 14 seconds for 30k is manageable, I have had the need to
work with arrays of over a million rows. Extrapolating this suggests that
it would take almost 8 minutes to load such a file. Thus, unless GNU APL
can magically improve overall performance by at least 10 times, I still
think we need a native CSV loading function.

Regards,
Elias

For reference, here is the APL code:

∇Z ← type convert_entry value
  →('n'≡type)/numeric
  →('s'≡type)/string
  ⎕ES 'Illegal conversion type'
numeric:
  Z←⍎value
  →end
string:
  Z←value
end:
∇

∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i
  separator ← ' '
  Z ← n (↑⍴pattern) ⍴ 0
  fd ← 'r' FIO∆fopen filename
  i ← ⎕IO

next:
  line ← FIO∆fgets fd           ⍝ Read one line from the file
  →(⍬≡line)/end
  →(10≠line[⍴line])/skip_nl     ⍝ If the line ends in a newline
  line ← line[⍳¯1+⍴line]        ⍝ Remove the newline
skip_nl:
  line ← ⎕UCS line
  Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line
  i ← i+1
  →next
end:

  FIO∆fclose fd
∇

And here is the Lisp code (the test case was running on SBCL), requires the
QL packages SPLIT-SEQUENCE and PARSE-NUMBER:

(defparameter *result*
           (time
            (with-open-file (s "apjs492452t1_mrt.txt")
              (let ((res (make-array '(34030 11))))
                (dotimes (i (array-dimension res 0))
                  (let* ((line (read-line s))
                         (parts (split-sequence:split-sequence #\Space line
:remove-empty-subseqs t)))
                    (loop
                      for ii from 0 below 10
                      for p in parts
                      do (setf (aref res i ii) (parse-number:parse-number
p)))
                    (setf (aref res i 10) (nth 10 parts))))
                res))))

On 18 January 2017 at 09:57, Blake McBride <blake1...@gmail.com> wrote:

> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <jinxiaoy...@gmail.com>
> wrote:
>
>> I always feel GNU APL kind of slow compared to Dyalog, but I never really
>> compared two in large dataset.
>> I'm mostly using J now for large dataset.
>> If Elias has the optimized code for GNU APL and a reproducible way to
>> measure timing, I'd like to compare it with Dyalog and J.
>
>
> I think that's actually a good idea.  It would be a good comparison.  It
> would really make it clear if there is a blaring problem.  But first the
> APL code should be optimized a bit (but nothing crazy like reading it all
> into memory right now.)
>
> --blake
>
>
>
>

Reply via email to