Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled

Ken Raeburn Wed, 05 Aug 2009 07:08:56 -0700

On Aug 5, 2009, at 05:06, Andy Wingo wrote:

Guile's VM does not yet have opcodes for +1 and and -1. It probably
should, though. Still, this is a fairly artificial test.

Yeah, and it turned out to be useless for what I had wanted to test atthe start, but I was still curious to see how it would work out.Based on your analysis of the "while" issue, it turned out to beinteresting. :-)

So, you're also timing Guile startup here, which takes about 20 ms.


Yes; except for the smallest cases, startup seems to be in the noise.

Poking at it a couple of times with Apple's process sampling tool, it
does seem to be spending a bit of time in garbage collection -- but
scm_leave_guile (called from scm_async_click via
scm_pthread_mutex_lock) and getrusage (called from
scm_c_get_internal_run_time via mytime) also seem to show up on the

stack a significant fraction of the time, and each of those appearsto

be a comparable if not bigger time sink.


This could be simply that Mac OS has different performance
characteristics than Linux. If locking uncontended mutexen or getting
the resource usage of a process are expensive, that doesn't sound very
good...

From my experience with profiling code on another project, mutexesdefinitely appear more expensive on Mac OS X vs Linux/glibc whenthere's contention. I just ran a quick test on a couple of x86machines, and it does appear that the Mac OS X version ofpthread_mutex_lock+pthread_mutex_unlock takes half again as manycycles as the GNU/Linux version, without contention. (A test with10000000 iterations of getrusage(RUSAGE_SELF) shows a more drasticdifference between OSes -- in terms of cycle counts, more than 3.5x asexpensive on Mac OS X as on GNU/Linux.)

But it's not pthread_mutex_lock that's showing up most often, it'ssigaltstack and sigprocmask, called by scm_leave_guile. Some thoughtscome to mind, but I don't have much time to work with them right now,maybe later:

(1) In scm_pthread_mutex_lock, we leave and re-enter guile mode sothat we don't block the thread while in guile mode. But we could usepthread_mutex_trylock first, and avoid the costs scm_leave_guile seemsto incur on the Mac. If we can't acquire the lock, it should returnimmediately, and then we can do the expensive, blocking version. Aquick, hack version of this changed my run time for A(3,8) from 17.5sto 14.5s, saving about 17%; sigaltstack and sigprocmask are still inthe picture, because they're called fromscm_catch_with_pre_unwind_handler. I'll work up a nicer patch later.

(2) We're locking a global mutex to check if there's an async itempending for the current thread. First, why not a per-thread mutex sowe don't have to potentially stop *all* threads just to see if thecurrent one has async work to do? But second, it looks to me like byfar the common case will be checking the async queue when it's empty-- not changing it, and not checking it when it's non-empty. Ifthat's the case, can we use some other mechanism, perhaps atomic queuemanipulation operations (like Apple's OSAtomic{Enqueue,Dequeue}) oratomic compare-and-swap, to speed up the common case? It introducessome portability problems, though, and if pthread_mutex_trylock helpsenough maybe it doesn't matter.

(3) My four-year-old comments on scm_enter/leave_guile, recorded inthreads.c around line 300, still stand.... Those functions reallyought to go away. At least they're confined to one file, now. Someof it looks a little messy, but I can probably get rid of some of theuses....


(4) Yeah, yeah, I should run gprof already. :-)

Ken

Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled

Reply via email to