Gabriel Genellina wrote: > At Saturday 9/12/2006 23:04, Andrea Griffini wrote: > >> I implemented that crazy idea and seems working... in its >> current hacked state can still pass the test suite (exluding > > What crazy idea? And what is this supposed to do? > The idea is to avoid looking up constants several times into dictionaries that didn't change (e.g. builtins).
Reading a bit about other optimization proposals I didn't find a similar one so I decided to invest some spare time in it. The idea is 1) Add a "timestamp" to dictionaries, so when a dictionary is changed the timestamp gets updated 2) Store a cached lookup for constants; the cached lookup is stored as a timestamp value and a naked pointer to the result. The algorithm for the lookup of a given constant is: if ( <<cached_timestamp>> == d->timestamp) { x = <<cached_value>>; } else { x = PyDict_GetItem(d, key); <<cached_timestamp>> = d->timestamp; <<cached_value>> = x; } using a naked pointer is safe because it will be used only if the dictionary wasn't touched, hence the value is surely still alive. The original place I thought about where to store the cached lookup was the bytecode, however after reading python sources I resorted instead to a dedicated space inside the code object. The code for LOAD_GLOBAL uses something like if (co->co_cachedtstamps[oparg] == d->timestamp) ... i.e. I used an array indexed by the index of the co_name used for lookups. The patched code is currently working, however I found that while the hit/miss ratios are impressive (as I expected) the speedup is simply absent. Moreover there is no difference at all between paying for the timestamp handling and NOT using the cached lookups or instead paying AND using the cached lookups (!). Absurdely python on my PC runs faster if I in addition to the cached lookup code also leave in place the hit/miss statistics (a few static ints increment and a static atexit-ed output function). Also it made a lot of difference about where the timestamp was placed inside the dictobject structure... In addition to the not impressive results (my patched python now is just a bit *slower* than original one :-D) there is also another complication. The LOAD_GLOBAL actually needs TWO lookups, so I used two cached results (one for globals, one for builtins). The ideal solution however IMO would be in this case to have two timestamps and one cached value instead... (if neither dict was touched since last lookup then the result will be the cached one). The complication is that a lot of lookups are done by the LOAD_ATTR instead and thinking to the ideal solution for new classes made my brain explode (mro, descriptor and stuff...). It would be simple to do something for classic classes, but would that be worth (i mean... aren't those going to disappear ?). Probably something can be done for using caches for LOAD_ATTR for modules (to speed up a bit things like math.sin or mod1.mod2.mod3.func). Any suggestion is welcome... Andrea -- http://mail.python.org/mailman/listinfo/python-list