Thanks. That worked. It was fast enough to not affect my timings that I gave above.
Regards, Elias On 19 January 2017 at 01:05, Juergen Sauermann < juergen.sauerm...@t-online.de> wrote: > Hi Elias, > > I believe you need to disclose (⊃) the outer vector: > > Z←(1 2 3) (4 5 6) (7 8 9) > Z > 1 2 3 4 5 6 7 8 9 > ⊃Z > 1 2 3 > 4 5 6 > 7 8 9 > > /// Jürgen > > > On 01/18/2017 05:52 PM, Elias Mårtenson wrote: > > Thanks, I changed my function to use the new FIO 49, and the resuolt is > much more compact: > > ∇Z ← pattern read_csv_n filename > Z← {pattern convert_entry¨ (⍵≠' ') ⊂ ⍵}¨ ⎕FIO[49] filename > ∇ > > It's a bit faster too, as this version runs in 11 seconds. > > However, the result is not entirely correct, as this version creates a > 1-dimensional array where each element is an array consisting of the values > for one row. > > Is there some way I can use EACH to map over the elemts and generate a > two-dimensional array? > > Regards, > Elias > > On 18 January 2017 at 21:46, Juergen Sauermann < > juergen.sauerm...@t-online.de> wrote: > >> Hi, >> >> as a start I have added *⎕FIO**[49] *in *SVN 851*. It reads an entire >> UTF8 encoded file and puts every line of the >> file into one nested Item of the result. Trailing CR and LF are being >> removed in the precess. >> >> Next step is to turn *⎕FIO[49]* into an operator so that you can give it >> an APL function that converts every line into the >> desired result. Until then you can use it like: >> >> *Z←CONVERT¨Z←⎕FIO**[49] 'filename'* >> >> /// Jürgen >> >> >> On 01/18/2017 11:17 AM, Elias Mårtenson wrote: >> >> You've all made good points, and I changed the code slightly to provide >> the initial array side in order to avoid the recreation of the array on >> each iteration. This brought down the loading time to a much more bearable >> *14 >> seconds*. I rewrote the Lisp code to be compatible with the APL code and >> the time was *1.46 seconds*. This suggests that GNU APL is consistently >> about 10 times slower than non-optimised Lisp code. To me, this is not >> unexpected given the fact that GNU APL isn't designed to be >> high-performance. >> >> However, while 14 seconds for 30k is manageable, I have had the need to >> work with arrays of over a million rows. Extrapolating this suggests that >> it would take almost 8 minutes to load such a file. Thus, unless GNU APL >> can magically improve overall performance by at least 10 times, I still >> think we need a native CSV loading function. >> >> Regards, >> Elias >> >> For reference, here is the APL code: >> >> ∇Z ← type convert_entry value >> →('n'≡type)/numeric >> →('s'≡type)/string >> ⎕ES 'Illegal conversion type' >> numeric: >> Z←⍎value >> →end >> string: >> Z←value >> end: >> ∇ >> >> ∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i >> separator ← ' ' >> Z ← n (↑⍴pattern) ⍴ 0 >> fd ← 'r' FIO∆fopen filename >> i ← ⎕IO >> >> next: >> line ← FIO∆fgets fd ⍝ Read one line from the file >> →(⍬≡line)/end >> →(10≠line[⍴line])/skip_nl ⍝ If the line ends in a newline >> line ← line[⍳¯1+⍴line] ⍝ Remove the newline >> skip_nl: >> line ← ⎕UCS line >> Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line >> i ← i+1 >> →next >> end: >> >> FIO∆fclose fd >> ∇ >> >> And here is the Lisp code (the test case was running on SBCL), requires >> the QL packages SPLIT-SEQUENCE and PARSE-NUMBER: >> >> (defparameter *result* >> (time >> (with-open-file (s "apjs492452t1_mrt.txt") >> (let ((res (make-array '(34030 11)))) >> (dotimes (i (array-dimension res 0)) >> (let* ((line (read-line s)) >> (parts (split-sequence:split-sequence #\Space >> line :remove-empty-subseqs t))) >> (loop >> for ii from 0 below 10 >> for p in parts >> do (setf (aref res i ii) (parse-number:parse-number >> p))) >> (setf (aref res i 10) (nth 10 parts)))) >> res)))) >> >> On 18 January 2017 at 09:57, Blake McBride <blake1...@gmail.com> wrote: >> >>> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <jinxiaoy...@gmail.com> >>> wrote: >>> >>>> I always feel GNU APL kind of slow compared to Dyalog, but I never >>>> really compared two in large dataset. >>>> I'm mostly using J now for large dataset. >>>> If Elias has the optimized code for GNU APL and a reproducible way to >>>> measure timing, I'd like to compare it with Dyalog and J. >>> >>> >>> I think that's actually a good idea. It would be a good comparison. It >>> would really make it clear if there is a blaring problem. But first the >>> APL code should be optimized a bit (but nothing crazy like reading it all >>> into memory right now.) >>> >>> --blake >>> >>> >>> >>> >> >> >> > >