I wanted to use GNU APL to work on a dataset of star data. The file consists of 34030 lines of the following form:
892376 3813 4.47 0.4699 1.532 0.007 7306.69 0.823 0.4503 0 --- 1026146 4261 4.57 0.6472 14.891 0.12 11742.56 1.405 0.7229 0 --- 1026474 4122 4.56 0.5914 1.569 0.006 30471.8 1.204 0.6061 0 --- 1162635 3760 4.77 0.4497 15.678 0.019 10207.47 0.978 0.5445 1 --- I wrote a generic CSV loader to handle this (source code at the end of this email), and loaded the data like so: * z ← 'nnnnnnnnnns' read_csv 'apjs492452t1_mrt.txt'* This took many minutes to load, which in my opinion shouldn't happen. Now, I have a few questions: 1. Is there a way to speed up this code? 2. Is there something that could be done on the GNU APL implementation side to make this faster? 3. Shouldn't we have a generic ⎕CSV function or something like that which would be able to load CSV files in milliseconds regardless of size? This should be trivial to do in C++. Here's the code in question: ∇Z ← type convert_entry value →('n'≡type)/numeric →('s'≡type)/string ⎕ES 'Illegal conversion type' numeric: Z←⍎value →end string: Z←value end: ∇ ∇Z ← pattern read_csv filename ;fd;line;separator separator ← ' ' Z ← 0 (↑⍴pattern) ⍴ ⍬ fd ← 'r' FIO∆fopen filename next: line ← FIO∆fgets fd ⍝ Read one line from the file →(⍬≡line)/end →(10≠line[⍴line])/skip_nl ⍝ If the line ends in a newline line ← line[⍳¯1+⍴line] ⍝ Remove the newline skip_nl: line ← ⎕UCS line Z ← Z⍪ pattern convert_entry¨ (line≠separator) ⊂ line →next end: FIO∆fclose fd ∇