Hi Elias,

This may chop some timing. It is not pretty, add to that my
inexperience with APL idioms. This, may be the result of what had
grown on me from CL

conv ← { 
(⍎⊃⍵[1])(⍎⊃⍵[2])(⍎⊃⍵[3])(⍎⊃⍵[4])(⍎⊃⍵[5])(⍎⊃⍵[6])(⍎⊃⍵[7])(⍎⊃⍵[8])(⍎⊃⍵[9])(⍎⊃⍵[10])
(⊃⍵[11]) }
{conv (⍵≠' ')⊂⍵}⎕FIO[49] '/tmp/sample.txt'

Also for CL here is another angle at it using built-in functions
without optimization (and It does not mimic APL). Note that It may not
fit the bill as it generates list of vectors rather than 2D array

(time
   (with-open-file (s "/tmp/apjs492452t1_mrt.txt")
      (loop for line = (read-line s nil nil nil)
    while line
    collect (with-input-from-string (*standard-input* line)
      (vector (read) (read) (read) (read) (read)
      (read) (read) (read) (read) (read)
      (read-line))))))

HiH

Ala'a

On Wed, Jan 18, 2017 at 9:19 PM, Elias Mårtenson <loke...@gmail.com> wrote:
> Thanks. That worked. It was fast enough to not affect my timings that I gave
> above.
>
> Regards,
> Elias
>
> On 19 January 2017 at 01:05, Juergen Sauermann
> <juergen.sauerm...@t-online.de> wrote:
>>
>> Hi Elias,
>>
>> I believe you need to disclose (⊃) the outer vector:
>>
>>       Z←(1 2 3) (4 5 6) (7 8 9)
>>       Z
>>  1 2 3  4 5 6  7 8 9
>>       ⊃Z
>> 1 2 3
>> 4 5 6
>> 7 8 9
>>
>> /// Jürgen
>>
>>
>> On 01/18/2017 05:52 PM, Elias Mårtenson wrote:
>>
>> Thanks, I changed my function to use the new FIO 49, and the resuolt is
>> much more compact:
>>
>> ∇Z ← pattern read_csv_n filename
>>   Z← {pattern convert_entry¨ (⍵≠' ') ⊂ ⍵}¨ ⎕FIO[49] filename
>> ∇
>>
>> It's a bit faster too, as this version runs in 11 seconds.
>>
>> However, the result is not entirely correct, as this version creates a
>> 1-dimensional array where each element is an array consisting of the values
>> for one row.
>>
>> Is there some way I can use EACH to map over the elemts and generate a
>> two-dimensional array?
>>
>> Regards,
>> Elias
>>
>> On 18 January 2017 at 21:46, Juergen Sauermann
>> <juergen.sauerm...@t-online.de> wrote:
>>>
>>> Hi,
>>>
>>> as a start I have added ⎕FIO[49] in SVN 851. It reads an entire UTF8
>>> encoded file and puts every line of the
>>> file into one nested Item of the result. Trailing CR and LF are being
>>> removed in the precess.
>>>
>>> Next step is to turn ⎕FIO[49] into an operator so that you can give it an
>>> APL function that converts every line into the
>>> desired result. Until then you can use it like:
>>>
>>> Z←CONVERT¨Z←⎕FIO[49] 'filename'
>>>
>>> /// Jürgen
>>>
>>>
>>> On 01/18/2017 11:17 AM, Elias Mårtenson wrote:
>>>
>>> You've all made good points, and I changed the code slightly to provide
>>> the initial array side in order to avoid the recreation of the array on each
>>> iteration. This brought down the loading time to a much more bearable 14
>>> seconds. I rewrote the Lisp code to be compatible with the APL code and the
>>> time was 1.46 seconds. This suggests that GNU APL is consistently about 10
>>> times slower than non-optimised Lisp code. To me, this is not unexpected
>>> given the fact that GNU APL isn't designed to be high-performance.
>>>
>>> However, while 14 seconds for 30k is manageable, I have had the need to
>>> work with arrays of over a million rows. Extrapolating this suggests that it
>>> would take almost 8 minutes to load such a file. Thus, unless GNU APL can
>>> magically improve overall performance by at least 10 times, I still think we
>>> need a native CSV loading function.
>>>
>>> Regards,
>>> Elias
>>>
>>> For reference, here is the APL code:
>>>
>>> ∇Z ← type convert_entry value
>>>   →('n'≡type)/numeric
>>>   →('s'≡type)/string
>>>   ⎕ES 'Illegal conversion type'
>>> numeric:
>>>   Z←⍎value
>>>   →end
>>> string:
>>>   Z←value
>>> end:
>>> ∇
>>>
>>> ∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i
>>>   separator ← ' '
>>>   Z ← n (↑⍴pattern) ⍴ 0
>>>   fd ← 'r' FIO∆fopen filename
>>>   i ← ⎕IO
>>>
>>> next:
>>>   line ← FIO∆fgets fd           ⍝ Read one line from the file
>>>   →(⍬≡line)/end
>>>   →(10≠line[⍴line])/skip_nl     ⍝ If the line ends in a newline
>>>   line ← line[⍳¯1+⍴line]        ⍝ Remove the newline
>>> skip_nl:
>>>   line ← ⎕UCS line
>>>   Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line
>>>   i ← i+1
>>>   →next
>>> end:
>>>
>>>   FIO∆fclose fd
>>> ∇
>>>
>>> And here is the Lisp code (the test case was running on SBCL), requires
>>> the QL packages SPLIT-SEQUENCE and PARSE-NUMBER:
>>>
>>> (defparameter *result*
>>>            (time
>>>             (with-open-file (s "apjs492452t1_mrt.txt")
>>>               (let ((res (make-array '(34030 11))))
>>>                 (dotimes (i (array-dimension res 0))
>>>                   (let* ((line (read-line s))
>>>                          (parts (split-sequence:split-sequence #\Space
>>> line :remove-empty-subseqs t)))
>>>                     (loop
>>>                       for ii from 0 below 10
>>>                       for p in parts
>>>                       do (setf (aref res i ii) (parse-number:parse-number
>>> p)))
>>>                     (setf (aref res i 10) (nth 10 parts))))
>>>                 res))))
>>>
>>> On 18 January 2017 at 09:57, Blake McBride <blake1...@gmail.com> wrote:
>>>>
>>>> On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <jinxiaoy...@gmail.com>
>>>> wrote:
>>>>>
>>>>> I always feel GNU APL kind of slow compared to Dyalog, but I never
>>>>> really compared two in large dataset.
>>>>> I'm mostly using J now for large dataset.
>>>>> If Elias has the optimized code for GNU APL and a reproducible way to
>>>>> measure timing, I'd like to compare it with Dyalog and J.
>>>>
>>>>
>>>> I think that's actually a good idea.  It would be a good comparison.  It
>>>> would really make it clear if there is a blaring problem.  But first the 
>>>> APL
>>>> code should be optimized a bit (but nothing crazy like reading it all into
>>>> memory right now.)
>>>>
>>>> --blake
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>

Reply via email to