Unfortunately, I didn't see visible performance gain on x86. :( It's near the measurement mistake. Probably it's because reading from L1 data cache is very cheap.
Thanks. Dmitry. On Mon, Dec 3, 2012 at 7:33 PM, Nikita Popov <nikita....@gmail.com> wrote: > On Mon, Dec 3, 2012 at 10:35 AM, Dmitry Stogov <dmi...@zend.com> wrote: > >> The new proposed patch: http://pastebin.com/pj5fQTfN >> >> Now both execute_data->Ts and execute_data->CVs are removed and >> corresponding temporary and compiled variables accessed using >> "execute_data" as the base pointer. Temporary variables allocate directly >> before the "execute_data" in reverse order and compiled variables right >> after. So they can be accessed without any additional computations. The >> patch reduces the number of executed instructions and number of memory >> reads (about 8% less). >> > > Did you also test how much these 8% less memory reads improve performance? > Would be interesting to know :) > > Nikita >