Hi Neil, Neil Jerram <[EMAIL PROTECTED]> writes:
> Interesting piece of work. > > It seems to me, though, that there are 3 things going on here. > > 1. Memoization of global variable references that yield one of a > particular subset of common procedures. (I call this part > memoization because it seems similar to the memoization that we > already do for syntax like let, begin, and, etc.) > > 2. Inlining of the code for these procedures within CEVAL. > > 3. Changing IM_SYMs to be dynamic instead of fixed constants, plus the > macrology and GCC jump table stuff. > > Do you know what the relative contributions of these 3 changes are? Thanks Neil for clarifying this. The measurements you propose are indeed a good idea and the results are not exactly as I was expecting (which confirms that I'm not very good at predicting performance ;-)). BTW, imsyms are not assigned dynamically: they are assigned statically by the `extract-imsyms.sh' script. I made a series of measurements with Guile compiled with `-pg -O0'. Then I tried different configurations switching on and off each of these 3 features. The first table below summarizes the execution time improvement, looking at the execution time of `every' itself as well as the execution time of the whole program. `every' overall ------------------------------------+---------------------- jump table vs. switch | 0.8% -1.4% (worse!) inlining in `CEVAL ()' vs. funcall | 11.0% 4.7% The second table shows improvement compared to the non-memoizing + jump table version (i.e., with `(eval-disable 'inline)': memoization + jt + inline | 32.4% 22.1% memoization + switch + inline | 31.9% 23.2% memoization + jt + funcall | 24.0% 18.3% (Beware: I only run each test case 3 times or so so these figures should not be considered as an ultimate benchmark! I'm attaching the whole results for the record.) In short, the outcome of using a jump table is negligible in this context (it's really a microoptimization compared to the two other things). Function call overhead, however, _is_ important, though only the second source of improvement. Repeatedly using function calls to execute a handful of instructions is costly. Plus it probably increases cache misses, things like that. Now, if we generalized the memoization thing, as you suggested, so that any procedure could be memoized (based on user annotations), then things may be a bit different because we would be using indirect function calls (i.e., like `SCM (*func) () = xxx; return (func (arg));') while in my measurements I was using immediate function calls (as in `scm_car (op)'). I should compare indirect and immediate function calls, but I presume that there is a slight performance difference. Finally, memoization does indeed play an important role. I suspect that it's mostly because, for instance, argument count is only checked at memoization time, and not when the "inlinable" is actually executed. Plus the memoization code is pretty local (unlike when `CEVAL ()' has to go through `evap0', then `evalp1', etc.). I'm afraid this is kind of a dirty report, but I hope it sheds some light on the issue. Also, Rob mentioned on IRC that he was concerned about the global switch. I believe this can be fixed using fluids or something like that so that inlining can be enabled/disabled on a per-module basis (as we did with `current-reader'). But that will be the topic of another thread maybe. ;-) Thanks, Ludovic. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel