Hi Ken, On Tue 04 Aug 2009 11:28, Ken Raeburn <raeb...@raeburn.org> writes:
> I implemented Ackermann's function A(m,n), a recursive function with > scary performance characteristics, based on the definition given at > wikipedia involving lots of +1 and -1 operations... Guile's VM does not yet have opcodes for +1 and and -1. It probably should, though. Still, this is a fairly artificial test. > I wrote both elisp and scheme versions; I changed the recursion to an > explicitly-managed stack of values, to avoid some of the dynamic-vs- > lexical argument binding issues, turn the native stack usage into large > numbers of cell allocations and discards, and avoid exceeding the > maximum Lisp evaluation depth Emacs permits. OK now we know we are entering the realm of the artificial benchmark :-) > Not surprisingly, it's > not too fast, and Emacs spends a lot of time in garbage collection > passes even with the byte-compiled version. My time-test function here > counts seconds of real time (on an 8-core Mac without a lot of other > stuff going on, so CPU usage is probably very close): > > ELISP> (time-test '(ackermann 3 1)) > 1.6e-05 > ELISP> (time-test '(ackermann 3 6)) > 0.036018 > ELISP> (time-test '(ackermann 3 7)) > 0.16574899999999998 > ELISP> (time-test '(ackermann 3 8)) > 0.6457170000000001 > ELISP> (time-test '(ackermann 3 9)) > 2.482042 > ELISP> (time-test '(ackermann 4 0)) > 1.4e-05 > ELISP> (time-test '(ackermann 4 1)) > 642.746514 OK > With the installed guile-1.8.7 binary and the interpreted Scheme version > of the code, performance was worse: > > % time guile -l ack.scm -c '(ackermann 3 1)' > 0.009u 0.003s 0:00.01 0.0% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 6)' > 0.409u 0.078s 0:00.52 90.3% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 9)' > 25.300u 4.540s 0:29.85 99.9% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 4 1)' > 6016.150u 1122.109s 1:58:59.32 99.9% 0+0k 8+4io 149pf+0w So, you're also timing Guile startup here, which takes about 20 ms. > Yes, that's just shy of two hours for the last one, where Emacs took > less than 11 minutes. Of course, that's comparing compiled code to interpreted code, but that is a pretty bad result, yes. > I built and installed 1.9.1, so I could test with more recent code: > > % time guile -l ack.scm -c '(ackermann 3 1)' > 0.026u 0.006s 0:00.03 66.6% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 6)' > 0.761u 0.141s 0:00.90 100.0% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 9)' > 47.242u 8.644s 0:55.89 99.9% 0+0k 0+0io 0pf+0w > % > > Since this behaved worse than 1.8.6, I skipped the 4,1 test. Yes, Guile 1.9's interpreter will be slower than 1.8's. > I tested > the compiled version: > > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 1))' > 0.012u 0.004s 0:00.01 100.0% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 6))' > 0.772u 0.333s 0:01.10 100.0% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 9))' > 48.728u 22.016s 1:10.75 99.9% 0+0k 0+0io 0pf+0w > % > > Not much better than loading the .scm file, and only better for small > values of n... in fact, with m=3 and n>=6 the performance seems worse > in the precompiled case: > > % time guile -l ack.scm -c '(ackermann 3 10)' > 181.997u 34.928s 3:36.94 99.9% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 10))' > 196.678u 85.618s 4:42.34 99.9% 0+0k 0+0io 0pf+0w > % > > Poking at it a couple of times with Apple's process sampling tool, it > does seem to be spending a bit of time in garbage collection -- but > scm_leave_guile (called from scm_async_click via > scm_pthread_mutex_lock) and getrusage (called from > scm_c_get_internal_run_time via mytime) also seem to show up on the > stack a significant fraction of the time, and each of those appears to > be a comparable if not bigger time sink. This could be simply that Mac OS has different performance characteristics than Linux. If locking uncontended mutexen or getting the resource usage of a process are expensive, that doesn't sound very good... > According to the execution timing with the compiled code, GC actually > doesn't seem to be a huge fraction of the run time, maybe 8-9%: > > scheme@(guile-user)> ,t (ackermann 3 8) > 2045 > clock utime stime cutime cstime gctime > 18.43 12.96 5.47 0.00 0.00 1.66 > scheme@(guile-user)> ,t (ackermann 3 9) > 4093 > clock utime stime cutime cstime gctime > 73.47 51.61 21.85 0.00 0.00 5.95 > scheme@(guile-user)> > > The latter is the case that took under three seconds in Emacs, > remember. Perhaps the deal is that 1+ is not being inlined. It seems to be calling out to the 1+ procedure, which is a primitive, actually making it a bit slower. I'll see what I can do. > In retrospect, this is much heavier on the interpretation or execution > of some of the simple operations than I intended, and Emacs benefits a > lot from opcodes like "add1", and I'm more interested specifically in > exercising allocation and GC at the moment, so it's not that useful a > test for me after all. But I thought I'd send this along anyways in > case someone was interested. In particular the fact that the > performance of the precompiled version gets worse as n grows might be > something to investigate, unless the cause is already known.... Thank you for the report, I'll investigate. Andy -- http://wingolog.org/