This series implements some more aggressive scheduler changes based on the original series that I sent out and now has been merged to master. In particular, it rewrites the scheduler to be bottom-up and top-down, and gives it a fancy new strategy involving a combination of limit scheduling and Sethi-Ullman numbering in order to tackle register-pressure-limited scenarios while still providing good parallelism otherwise. It also fixes some serious shortcomings in the low register pressure case, making us actually use the critical path information we were already computing before and using a better strategy to minimize stalls. Finally, it changes the heuristic for whether we should drop SIMD16 to something that, while probably not perfect, is probably still better than what we had previously.
While each patch has shader-db numbers, it's a little hard to see the forest from the trees while looking at each change individually, so here are the shader-db numbers for the entire series on bdw, created using my shader-db patch [1]: total instructions in shared programs: 7392779 -> 7386851 (-0.08%) instructions in affected programs: 24443 -> 18515 (-24.25%) helped: 15 HURT: 0 total cycles in shared programs: 56128804 -> 48572820 (-13.46%) cycles in affected programs: 54357022 -> 46801038 (-13.90%) helped: 60142 HURT: 801 LOST: 392 GAINED: 59 But note that most of the SIMD16 shaders we lost were bad SIMD16 shaders that probably wouldn't have helped us; the remaining gained shaders went from spilling and thrown out to actually useful. And, of course, the intervening patches ensure that many more SIMD16 programs are considered "useful." Notably, after this series, there are no more SIMD8 programs in my shader-db that spill anymore! Note that this series is a little different from the one that some people have been looking at before. In particular, I dropped an attempt to replace the LIFO heuristic that turned out to not be useful at all by the end (instead, I just nuked it), and I fixed a slight issue with how the amount above the register pressure threshold was being computed in the Sethi-Ullman patch. As this is the first version I'm actually sending out for review, I didn't bother to mark those changes in the commit messages. It's probably worth re-doing the benchmarks, since now my shader-db shows no regressions or lost SIMD16 shaders in any SynMark benchmark for whatever reason. This series is also available at git://people.freedesktop.org/~cwabbott0/mesa i965-sched-v3 [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097431.html Connor Abbott (7): i965: use real latencies in the pre-RA scheduler i965/sched: use a critical path heuristic i965/sched: get rid of the LIFO heuristic i965/sched: switch to register pressure scheduling dynamically i965/sched: switch to bottom-up scheduling i965/sched: use Sethi-Ullman numbering i965/fs: use a better heuristic for SIMD16 src/mesa/drivers/dri/i965/brw_fs.cpp | 59 +-- src/mesa/drivers/dri/i965/brw_fs.h | 8 +- .../drivers/dri/i965/brw_schedule_instructions.cpp | 443 +++++++++++++-------- src/mesa/drivers/dri/i965/brw_shader.h | 1 - 4 files changed, 313 insertions(+), 198 deletions(-) -- 2.4.3 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev