Hi Germán,
If I understand your script correctly, you want to grab all lines with GDP, sort the values by year and country and output them. Is that right? As a first warning: the csv module in Python mainly calls into a C-based implementation (_csv, see csv.__file__), so it will be hard to beat this in pure Scheme. But now, let’s begin with the optimization. These are my times: $ time guile-2.0 extract_gdp.scm real 0m0.509s $ time python3 extract_gdp.py real 0m0.089s The first step is using Guile 2.1.6 instead of 2.0. That reduces the runtime by 40% to 0.3s. Source: ftp://alpha.gnu.org/gnu/guile/guile-2.1.6.tar.xz $ time guile extract_gdp.scm real 0m0.296s $ time python3 extract_gdp.py real 0m0.089s So there’s a factor of 3.3 between Python and Guile on my machine. Aside from using a more recent Guile, I do not see obvious optimizations, however (more exactly: all my tries to speedup the code only made it slower). Though there might be optimizations I do not see, because 80% of the remaining time is spent in string-parsing. One thing where I don’t see how to make it cheaper in pure Scheme is string->number. That calls directly into libguile/numbers.c which does much more than what python's int() does (internally it calls mem2complex). But using a pure-scheme function which does less only makes it slower: (define (string->integer s) (define (b10fold x kept) (+ (* 10 kept) (- (char->integer x) 48))) (string-fold b10fold 0 s)) As I said: the above makes the code run slower, not faster. A native C function for string->integer (which only handles integers) could provide a speedup for that, but I don’t know whether you want to go that far. See http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=libguile/numbers.c;hb=475772ea57c97d0fa0f9ed9303db137d9798ddd3#l6439 However every time I thought I had a program optimized as far as possible, talking with Andy Wingo made it much faster, so there might be lots I’m missing. Given that just converting a bytevector read from the file to integers takes 0.8s, I do not think just using bytevectors will help: (bytevector->u8-list bv) ; takes 0.8s for your file Maybe there are more efficient ways to do this, though. Best wishes, Arne Germán Diago writes: > Hello everyone, > > I did a script that parses some file with the GDP since 1970 for many > countries. I filter the file and discard uninteresting fields, later I > write in a format suitable for gnuplot. > > I did this in python and guile. > > In python it takes around 1.1 seconds in my raspberry pi. > > In Guile it is taking around 11 seconds. > > I do not claim they are doing exactly the same: in python I use arrays and > dictionaries, in guile I am using mainly lists, I would like to know if you > could give me advice on how to optimize it. I am just training for now. > > The scripts in both python and guile are attached and the profile data for > scheme is below. Just place in the same directory the .csv file and it > should generate an output file with the data ready for gnuplot :) > > % cumulative self > time seconds seconds name > 26.24 3.45 3.43 %read-line > 20.51 2.68 2.68 string->number > 15.54 2.05 2.03 string-delete > 7.39 7.75 0.97 map > 5.13 3.96 0.67 transform-data > 4.07 1.75 0.53 format:format-work > 3.17 0.41 0.41 string=? > 2.87 0.37 0.37 string-ref > 1.81 2.50 0.24 tilde-dispatch > 1.81 0.24 0.24 number->string > 1.51 0.34 0.20 is-a-digit > 1.06 0.28 0.14 anychar-dispatch > 1.06 0.14 0.14 display > 1.06 0.14 0.14 string-length > 1.06 0.14 0.14 char>=? > 1.06 0.14 0.14 char<=? > 1.06 0.14 0.14 string-split > 0.60 0.08 0.08 length > 0.45 0.49 0.06 format:out-num-padded > 0.45 0.06 0.06 remove-dots > 0.30 0.04 0.04 %after-gc-thunk > 0.30 0.04 0.04 list-tail > 0.30 0.04 0.04 write-char > 0.15 3.53 0.02 loop > 0.15 3.47 0.02 read-line > 0.15 0.02 0.02 substring > 0.15 0.02 0.02 list-ref > 0.15 0.02 0.02 reverse! > 0.15 0.02 0.02 #<procedure 2360350 at extract_gdp.scm:58:10 > (e)> > 0.15 0.02 0.02 integer? > 0.15 0.02 0.02 char=? > 0.00 13.07 0.00 load-compiled/vm > 0.00 13.07 0.00 #<procedure 18c6180 at ice-9/top-repl.scm:31:6 > (thunk)> > 0.00 13.07 0.00 #<procedure 1a92e00 at ice-9/boot-9.scm:4045:3 > ()> > 0.00 13.07 0.00 call-with-prompt > 0.00 13.07 0.00 #<procedure 18c6100 at ice-9/top-repl.scm:66:5 > ()> > 0.00 13.07 0.00 apply-smob/1 > 0.00 13.07 0.00 catch > 0.00 13.07 0.00 #<procedure 1a919c0 at statprof.scm:655:4 ()> > 0.00 13.07 0.00 run-repl* > 0.00 13.07 0.00 save-module-excursion > 0.00 13.07 0.00 statprof > 0.00 13.07 0.00 start-repl* > 0.00 11.22 0.00 #<procedure 1a8a170 ()> > 0.00 3.53 0.00 call-with-input-file > 0.00 1.85 0.00 call-with-output-file > 0.00 1.79 0.00 for-each > 0.00 1.75 0.00 format > 0.00 0.14 0.00 get-fields > 0.00 0.10 0.00 #<procedure 2d398a0 at extract_gdp.scm:48:18 > (year)> > 0.00 0.06 0.00 #<procedure 2d021c8 at extract_gdp.scm:46:6 (p)> > 0.00 0.02 0.00 format:out-obj-padded > 0.00 0.02 0.00 remove > 0.00 0.02 0.00 call-with-output-string