Thanks for clearing that up.
I have an old implementation of large parts of srfi-13 if that would be of interest. I don't know how much you want to change. The licence situation of the reference implementation is weird iirc. A beginning could be to replace all higher order functions since that would minimize the kind of performance problems discussed here. -- Linus Björnstam On Fri, 12 Jun 2020, at 22:13, Ludovic Courtès wrote: > Hi, > > Linus Björnstam <linus.inter...@fastmail.se> skribis: > > > You can cut another 15-ish % from that loop by making an inline loop, btw > > > > (let loop ((pos 0)) > > (when (< pos (string-length str)) > > ... > > (loop (1+ pos))) > > > > I have been looking at the disassembly, even for simpler cases, but I > > haven't been able to understand enough of it. > > > > BTW: string-for-each is in the default environment, and is probably the > > same as the srfi-13 C implementation. > > ‘string-for-each’ in C (the default) is slower than its Scheme counterpart: > > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> (define (sfe proc str) > (define len (string-length str)) > (let loop ((i 0)) > (unless (= i len) > (proc (string-ref str i)) > (loop (+ 1 i))))) > scheme@(guile-user)> (define str (make-string 15000000)) > scheme@(guile-user)> ,t (sfe identity str) > ;; 0.263725s real time, 0.263722s run time. 0.000000s spent in GC. > scheme@(guile-user)> ,t (sfe identity str) > ;; 0.259538s real time, 0.259529s run time. 0.000000s spent in GC. > scheme@(guile-user)> ,t (string-for-each identity str) > ;; 0.841632s real time, 0.841624s run time. 0.000000s spent in GC. > scheme@(guile-user)> (version) > $2 = "3.0.2" > --8<---------------cut here---------------end--------------->8--- > > In general we seem to pay a high price for leaving (calling a subr) and > re-entering (via ‘scm_call_n’) the VM. This is especially acute here > because there’s almost nothing happening in C, so we keep bouncing > between Scheme and C. > > That’s another reason to start rewriting such primitives in Scheme and > have the C functions just call out to Scheme. > > If we do: > > perf record guile -c '(string-for-each identity (make-string 15000000))' > > we get this profile: > > --8<---------------cut here---------------start------------->8--- > Overhead Command Shared Object Symbol > 31.10% guile libguile-3.0.so.1.1.1 [.] vm_regular_engine > 27.48% guile libguile-3.0.so.1.1.1 [.] scm_call_n > 14.34% guile libguile-3.0.so.1.1.1 [.] scm_jit_enter_mcode > 3.55% guile libguile-3.0.so.1.1.1 [.] scm_i_string_ref > 3.37% guile libguile-3.0.so.1.1.1 [.] get_callee_vcode > 2.34% guile libguile-3.0.so.1.1.1 [.] scm_call_1 > 2.31% guile libguile-3.0.so.1.1.1 [.] scm_string_for_each > --8<---------------cut here---------------end--------------->8--- > > Indeed, we get better performance when turning off JIT: > > --8<---------------cut here---------------start------------->8--- > $ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity > (make-string 15000000))' > 0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata > 26396maxresident)k > 0inputs+0outputs (0major+1583minor)pagefaults 0swaps > $ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity > (make-string 15000000))' > 0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata > 26948maxresident)k > 0inputs+0outputs (0major+1748minor)pagefaults 0swaps > $ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity > (make-string 15000000))' > 0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata > 27324maxresident)k > 0inputs+0outputs (0major+2548minor)pagefaults 0swaps > --8<---------------cut here---------------end--------------->8--- > > So it seems that we just keep firing the JIT machinery on every > ‘scm_call_n’ for no benefit. > > That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, & > co. always show high in statprof: > > https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html > > Thanks, > Ludo’. > > >