Thanks, I changed my function to use the new FIO 49, and the resuolt is much more compact:
∇Z ← pattern read_csv_n filename Z← {pattern convert_entry¨ (⍵≠' ') ⊂ ⍵}¨ ⎕FIO[49] filename ∇ It's a bit faster too, as this version runs in 11 seconds. However, the result is not entirely correct, as this version creates a 1-dimensional array where each element is an array consisting of the values for one row. Is there some way I can use EACH to map over the elemts and generate a two-dimensional array? Regards, Elias On 18 January 2017 at 21:46, Juergen Sauermann < juergen.sauerm...@t-online.de> wrote: > Hi, > > as a start I have added *⎕FIO**[49] *in *SVN 851*. It reads an entire > UTF8 encoded file and puts every line of the > file into one nested Item of the result. Trailing CR and LF are being > removed in the precess. > > Next step is to turn *⎕FIO[49]* into an operator so that you can give it > an APL function that converts every line into the > desired result. Until then you can use it like: > > *Z←CONVERT¨Z←⎕FIO**[49] 'filename'* > > /// Jürgen > > > On 01/18/2017 11:17 AM, Elias Mårtenson wrote: > > You've all made good points, and I changed the code slightly to provide > the initial array side in order to avoid the recreation of the array on > each iteration. This brought down the loading time to a much more bearable *14 > seconds*. I rewrote the Lisp code to be compatible with the APL code and > the time was *1.46 seconds*. This suggests that GNU APL is consistently > about 10 times slower than non-optimised Lisp code. To me, this is not > unexpected given the fact that GNU APL isn't designed to be > high-performance. > > However, while 14 seconds for 30k is manageable, I have had the need to > work with arrays of over a million rows. Extrapolating this suggests that > it would take almost 8 minutes to load such a file. Thus, unless GNU APL > can magically improve overall performance by at least 10 times, I still > think we need a native CSV loading function. > > Regards, > Elias > > For reference, here is the APL code: > > ∇Z ← type convert_entry value > →('n'≡type)/numeric > →('s'≡type)/string > ⎕ES 'Illegal conversion type' > numeric: > Z←⍎value > →end > string: > Z←value > end: > ∇ > > ∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i > separator ← ' ' > Z ← n (↑⍴pattern) ⍴ 0 > fd ← 'r' FIO∆fopen filename > i ← ⎕IO > > next: > line ← FIO∆fgets fd ⍝ Read one line from the file > →(⍬≡line)/end > →(10≠line[⍴line])/skip_nl ⍝ If the line ends in a newline > line ← line[⍳¯1+⍴line] ⍝ Remove the newline > skip_nl: > line ← ⎕UCS line > Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line > i ← i+1 > →next > end: > > FIO∆fclose fd > ∇ > > And here is the Lisp code (the test case was running on SBCL), requires > the QL packages SPLIT-SEQUENCE and PARSE-NUMBER: > > (defparameter *result* > (time > (with-open-file (s "apjs492452t1_mrt.txt") > (let ((res (make-array '(34030 11)))) > (dotimes (i (array-dimension res 0)) > (let* ((line (read-line s)) > (parts (split-sequence:split-sequence #\Space > line :remove-empty-subseqs t))) > (loop > for ii from 0 below 10 > for p in parts > do (setf (aref res i ii) (parse-number:parse-number > p))) > (setf (aref res i 10) (nth 10 parts)))) > res)))) > > On 18 January 2017 at 09:57, Blake McBride <blake1...@gmail.com> wrote: > >> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <jinxiaoy...@gmail.com> >> wrote: >> >>> I always feel GNU APL kind of slow compared to Dyalog, but I never >>> really compared two in large dataset. >>> I'm mostly using J now for large dataset. >>> If Elias has the optimized code for GNU APL and a reproducible way to >>> measure timing, I'd like to compare it with Dyalog and J. >> >> >> I think that's actually a good idea. It would be a good comparison. It >> would really make it clear if there is a blaring problem. But first the >> APL code should be optimized a bit (but nothing crazy like reading it all >> into memory right now.) >> >> --blake >> >> >> >> > > >