I've spent a bunch of time rebasing this series to remove the excess code churn and I've just pushed the results to the shader-cache branch mentioned below. There are no code changes to the end result but I've managed to get the patch count down to 80 (was 96 i think) and things should be much easier to review now.
I've also had reports of people testing with additional games such as Dota 2 and seeing good results. On Tue, 2016-06-21 at 16:08 +1000, Timothy Arceri wrote: > Rather than send 90+ patches to the list. Please see the repo at the > bottom of this email. > > The big update is I've added all stages but compute and tested with a > few games and everything seems to be working well so far. Enabling > shader cache with the Shadow of Mordor benchmark make things > noticeably > smoother and helps consitently keep the min FPS at 15 on my Skylake, > were as without it can be anywhere between 4-15. > > The elemental demo which Dave pointed out as also doing a bunch of > compiles during the demo is also smoother especially on the second > run > but its really slow on my Skylake regardless. Maybe someone with a > highend Skylake would like to give it a try. > > > V3: > - add support for geometry and tessellation stages > - cache clip planes > - reserve parameter storage before restoring list > - stop losing buffer blocks on cache fallback > - lots of little fixes I cant remember > > V2: > - rebased on master > - add support for encoding doubles > - renamed skip_cache params to is_cache_fallback, and fix related bug > when > disabling shader cache for xfb. > > This series is based on the great work done by Carl, Kristian and > others. > > I've split up Carls original patches for easier review, and also > merged > a number of fixes and clean-ups into his patches. However there is a > little more code churn than is ideal as the appoach taken by the > original patches needed to be modified quite a lot, I'm hoping its > not > more than people can live with as I'd like to keep some of the > history > rather than just squashing everything. > > For now I have left in some printf's as the feature is still disabled > by default and they are useful for debugging. I intend to fix this > soon > to hide them behind an environment var. > > There are no regressions after two runs of piglit with shader cache > enabled on my Broadwell machine. > > This series enables on disk shader cache for all stage except compute > programs. For now transform feedback, and SSO programs skip using the > cache, these will be added as follow ups. > > My main goal with this series is to land something that > passes piglit there is a number of optimisations that can still be > done > such as skipping more validation and state recreation when falling > back > to a full recompile but I would rather leave this until we have > something fully working. > > Here are the shader-db times (from V2): > > Cache disabled: > > Thread 1 took 1360.47 seconds and compiled 13015 shaders (not > including > SIMD16) with 50 GL context switches > Thread 3 took 1349.85 seconds and compiled 12848 shaders (not > including > SIMD16) with 40 GL context switches > Thread 2 took 1362.94 seconds and compiled 12637 shaders (not > including > SIMD16) with 36 GL context switches > Thread 0 took 1352.41 seconds and compiled 12593 shaders (not > including > SIMD16) with 46 GL context switches > > Cache enabled first run: > > Thread 1 took 1410.30 seconds and compiled 12678 shaders (not > including > SIMD16) with 34 GL context switches > Thread 2 took 1421.35 seconds and compiled 12822 shaders (not > including > SIMD16) with 50 GL context switches > Thread 0 took 1410.49 seconds and compiled 12999 shaders (not > including > SIMD16) with 40 GL context switches > Thread 3 took 1426.67 seconds and compiled 12594 shaders (not > including > SIMD16) with 48 GL context switches > > Cache enabled second run: > > Thread 0 took 259.84 seconds and compiled 12817 shaders (not > including > SIMD16) with 40 GL context switches > Thread 3 took 257.03 seconds and compiled 12533 shaders (not > including > SIMD16) with 50 GL context switches > Thread 1 took 256.18 seconds and compiled 12828 shaders (not > including > SIMD16) with 40 GL context switches > Thread 2 took 261.31 seconds and compiled 12915 shaders (not > including > SIMD16) with 39 GL context switches > > You can find the series in the shader-cache branch of: > > https://github.com/tarceri/Mesa_arrays_of_arrays.git > > MESA_GLSL_CACHE_ENABLE=1 enables the cache. > > > > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev