From: Connor Abbott <cwabbo...@gmail.com> This series implements VK_AMD_shader_ballot for radv. This extension builds on VK_EXT_shader_subgroup_ballot and VK_EXT_shader_subgroup_vote by adding a number of reductions across a subgroup (or wavefront in AMD terminology). Previously, shaders had to use shared memory to compute, say, the average across all threads in a workgroup, or the minimum and maximum values across a workgroup. But that requires a lot of accesses to LDS memory, which is (relatively) slow. This extension allows the shader to do part of the reduction directly in registers, as long as it stays within a single wavefront, reducing the amount of traffic to the LDS that has to happen. It also adds a few AMD-specific instructions, like mbcnt. To get an idea of what exactly is in the extension, and what inclusive scan, exclusive scan, etc. mean, you can look at the GL extension which exposes mostly the same things [1].
Why should you care? It turns out that with this extension enabled, plus a few other AMD-specific extensions that are mostly trivial, DOOM will take a different path that uses shaders that were tuned specifically for AMD hardware. I haven't actually tested DOOM yet, since a few more things need to be wired up, but it's a lot less work than this extension and I'm sure Dave or Bas will be do it for me when they get around to it :). It uses a few new features of the AMDGPU LLVM backend that I just landed, as well as one more small change that still needs review: https://reviews.llvm.org/D34718, so it's going to require LLVM 6.0. It also uses the DPP modifier that was only added on VI since that was easier than using ds_swizzle (which is available on all GCN cards). It should be possible to implement support for older cards using ds_swizzle, but I haven't gotten to it yet. A note to those reviewing: it might be helpful to look at the LLVM changes that this series uses, in particular: https://reviews.llvm.org/rL310087 https://reviews.llvm.org/rL310088 https://reviews.llvm.org/D34718 in order to get the complete picture. This series depends on my previous series [2] to implement VK_EXT_shader_subgroup_vote and VK_EXT_shader_subgroup_ballot, if nothing else in order to be able to test the implementation. I think DOOM also uses the latter two extensions. I've also based on my series adding cross-thread semantics to NIR [3], which Jason needs to review, since I was hoping that would land first, although with a little effort it should be possible to land this first (it would require changing PATCH 01 a little). The whole thing is available at: git://people.freedesktop.org/~cwabbott0/mesa radv-amd-shader-ballot and the LLVM branch that I've been using to test, with the one patch added is at: https://github.com/cwabbott0/llvm.git dpp-intrinsics-v4 I've got some Crucible tests for exercising the various different parts of the implementation, although I didn't bother to test all the possible combinations of reductions, since they didn't really require any special code to implement anyways. I'll try and get that cleaned up and sent out soon. Maybe I should just push the tests? Finally, I'm leaving Valve soon (this week) to go back to school, and I suspect that I won't have too much time to work on this afterwards, so someone else will probably have to pick it up. I've been working on this for most of the summer, since it turned out to be a way more complicated beast to implement than I thought. It's required changes across the entire stack, from spirv-to-nir all the way down to register allocation in the LLVM backend. Thankfully, though, most of the tricky LLVM changes have landed (thanks Nicolai for reviewing!) and what's left is a lot more straightforward. I should still be around to answer questions, though. Whew! [1] https://www.khronos.org/registry/OpenGL/extensions/AMD/AMD_shader_ballot.txt [2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/164903.html [3] https://lists.freedesktop.org/archives/mesa-dev/2017-August/164898.html Connor Abbott (15): nir: define intrinsics needed for AMD_shader_ballot spirv: import AMD extensions header spirv: add plumbing for SPV_AMD_shader_ballot and Groups nir: rename and generalize nir_lower_read_invocation_to_scalar nir: scalarize AMD_shader_ballot intrinsics radv: call nir_lower_cross_thread_to_scalar() nir: add a lowering pass for some cross-workgroup intrinsics radv: use nir_lower_group_reduce() ac: move ac_to_integer() and ac_to_float() to ac_llvm_build.c ac: remove bitcast_to_float() ac: fix ac_get_type_size() for doubles ac: add support for SPV_AMD_shader_ballot ac/nir: add support for SPV_AMD_shader_ballot radv: enable VK_AMD_shader_ballot ac/nir: fix saturate emission src/amd/common/ac_llvm_build.c | 783 ++++++++++++++++++++- src/amd/common/ac_llvm_build.h | 120 ++++ src/amd/common/ac_nir_to_llvm.c | 300 ++++---- src/amd/vulkan/radv_device.c | 15 + src/amd/vulkan/radv_pipeline.c | 6 + src/compiler/Makefile.sources | 4 +- src/compiler/nir/nir.h | 11 +- src/compiler/nir/nir_intrinsics.h | 124 +++- ...scalar.c => nir_lower_cross_thread_to_scalar.c} | 63 +- src/compiler/nir/nir_lower_group_reduce.c | 179 +++++ src/compiler/nir/nir_print.c | 1 + src/compiler/spirv/GLSL.ext.AMD.h | 93 +++ src/compiler/spirv/nir_spirv.h | 2 + src/compiler/spirv/spirv_to_nir.c | 32 +- src/compiler/spirv/vtn_amd.c | 281 ++++++++ src/compiler/spirv/vtn_private.h | 9 + src/intel/compiler/brw_nir.c | 2 +- 17 files changed, 1846 insertions(+), 179 deletions(-) rename src/compiler/nir/{nir_lower_read_invocation_to_scalar.c => nir_lower_cross_thread_to_scalar.c} (56%) create mode 100644 src/compiler/nir/nir_lower_group_reduce.c create mode 100644 src/compiler/spirv/GLSL.ext.AMD.h create mode 100644 src/compiler/spirv/vtn_amd.c -- 2.9.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev