This little series implements lowering of indirectly accessed local variables larger than some threshold (8 floats?) to scratch space. This improves the performance of the CSDof synmark test by about 45% because it uses a large temporary array which we lower to if-ladders and then to piles of scratch.
The approach I've taken here is to add a new set of NIR intrinsics for reading and writing scratch. It's treated like any other form of IO with a new nir_lower_vars_to_scratch pass that lowers everything over a given size threshold to scratch space. Why do this in NIR? The primary reason is that this lets us lower to scratch *before* we do nir_lower_indirect_derefs so we can still use registers for small indirects where an if-ladder is more efficient than scratch space. Also, after gaving it a try, I really liked how those intrinsics turned out. This series is marked RFC because it's still a bit sketchy at the moment. There are a few things that would need to be finished before it's ready for landing: 1) I should probably run it through piglit. 2) The back-end portion doesn't yet handle doubles 3) We should use send-from-GRF for non-spill direct scratch reads/writes. Right now, it's still using MRFs which isn't great. If people like where this series is going, I can probably find some time to polish it to the point of mergeable. Jason Ekstrand (6): nir: Add load/store_scratch intrinsics nir: Add a pass for selectively lowering variables to scratch space i965/fs: Add a CHANNEL_IDS opcode i965/fs: Add DWord scattered read/write opcodes i965/fs: Implement the new nir_scratch_load/store opcodes i965: Lower large local arrays to scratch Timothy Arceri (1): i965: use nir_lower_indirect_derefs() for GLSL src/compiler/Makefile.sources | 1 + src/compiler/nir/nir.h | 8 +- src/compiler/nir/nir_clone.c | 1 + src/compiler/nir/nir_intrinsics.h | 6 +- src/compiler/nir/nir_lower_scratch.c | 258 ++++++++++++++++++++++ src/intel/vulkan/anv_pipeline.c | 10 - src/mesa/drivers/dri/i965/brw_defines.h | 10 + src/mesa/drivers/dri/i965/brw_fs.cpp | 113 ++++++++++ src/mesa/drivers/dri/i965/brw_fs.h | 6 + src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 170 ++++++++++++++ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 42 +++- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 4 +- src/mesa/drivers/dri/i965/brw_link.cpp | 13 -- src/mesa/drivers/dri/i965/brw_nir.c | 13 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 12 + 16 files changed, 631 insertions(+), 37 deletions(-) create mode 100644 src/compiler/nir/nir_lower_scratch.c -- 2.5.0.400.gff86faf _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev