On Mon, Aug 7, 2017 at 6:32 PM, Connor Abbott <conn...@valvesoftware.com> wrote: > From: Connor Abbott <cwabbo...@gmail.com> > > This series implements VK_AMD_shader_ballot for radv. This extension > builds on VK_EXT_shader_subgroup_ballot and VK_EXT_shader_subgroup_vote > by adding a number of reductions across a subgroup (or wavefront in AMD > terminology). Previously, shaders had to use shared memory to compute, > say, the average across all threads in a workgroup, or the minimum and > maximum values across a workgroup. But that requires a lot of accesses > to LDS memory, which is (relatively) slow. This extension allows the > shader to do part of the reduction directly in registers, as long as it > stays within a single wavefront, reducing the amount of traffic to the > LDS that has to happen. It also adds a few AMD-specific instructions, > like mbcnt. To get an idea of what exactly is in the extension, and what > inclusive scan, exclusive scan, etc. mean, you can look at the GL > extension which exposes mostly the same things [1]. > > Why should you care? It turns out that with this extension enabled, plus > a few other AMD-specific extensions that are mostly trivial, DOOM will > take a different path that uses shaders that were tuned specifically for > AMD hardware. I haven't actually tested DOOM yet, since a few more > things need to be wired up, but it's a lot less work than this extension > and I'm sure Dave or Bas will be do it for me when they get around to it > :). > > It uses a few new features of the AMDGPU LLVM backend that I just > landed, as well as one more small change that still needs review: > https://reviews.llvm.org/D34718, so it's going to require LLVM 6.0. It > also uses the DPP modifier that was only added on VI since that was > easier than using ds_swizzle (which is available on all GCN cards). It > should be possible to implement support for older cards using > ds_swizzle, but I haven't gotten to it yet. A note to those reviewing: > it might be helpful to look at the LLVM changes that this series uses, > in particular: > > https://reviews.llvm.org/rL310087 > https://reviews.llvm.org/rL310088 > https://reviews.llvm.org/D34718 > > in order to get the complete picture.
I've just pushed the last LLVM change required as https://reviews.llvm.org/rL310399, so this series should now work with upstream LLVM master. > > This series depends on my previous series [2] to implement > VK_EXT_shader_subgroup_vote and VK_EXT_shader_subgroup_ballot, if > nothing else in order to be able to test the implementation. I think > DOOM also uses the latter two extensions. I've also based on my series > adding cross-thread semantics to NIR [3], which Jason needs to review, > since I was hoping that would land first, although with a little effort > it should be possible to land this first (it would require changing > PATCH 01 a little). The whole thing is available at: > > git://people.freedesktop.org/~cwabbott0/mesa radv-amd-shader-ballot > > and the LLVM branch that I've been using to test, with the one patch > added is at: > > https://github.com/cwabbott0/llvm.git dpp-intrinsics-v4 I've also forced-pushed all three Mesa branches (nir-divergence-v4, radv-shader-ballot-v4, and radv-amd-shader-ballot) with trivial rebasing after pushing the last patch in this series. I've also pushed my Crucible tests to git://people.freedesktop.org/~cwabbott0/crucible amd-shader-ballot although I haven't yet cleaned things up. At least it'll be useful for making sure this code still works. > > I've got some Crucible tests for exercising the various different parts > of the implementation, although I didn't bother to test all the possible > combinations of reductions, since they didn't really require any special > code to implement anyways. I'll try and get that cleaned up and sent out > soon. Maybe I should just push the tests? > > Finally, I'm leaving Valve soon (this week) to go back to school, and I > suspect that I won't have too much time to work on this afterwards, so > someone else will probably have to pick it up. I've been working on this > for most of the summer, since it turned out to be a way more complicated > beast to implement than I thought. It's required changes across the > entire stack, from spirv-to-nir all the way down to register allocation > in the LLVM backend. Thankfully, though, most of the tricky LLVM > changes have landed (thanks Nicolai for reviewing!) and what's left is a > lot more straightforward. I should still be around to answer questions, > though. Whew! > > [1] > https://www.khronos.org/registry/OpenGL/extensions/AMD/AMD_shader_ballot.txt > [2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/164903.html > [3] https://lists.freedesktop.org/archives/mesa-dev/2017-August/164898.html > > Connor Abbott (15): > nir: define intrinsics needed for AMD_shader_ballot > spirv: import AMD extensions header > spirv: add plumbing for SPV_AMD_shader_ballot and Groups > nir: rename and generalize nir_lower_read_invocation_to_scalar > nir: scalarize AMD_shader_ballot intrinsics > radv: call nir_lower_cross_thread_to_scalar() > nir: add a lowering pass for some cross-workgroup intrinsics > radv: use nir_lower_group_reduce() > ac: move ac_to_integer() and ac_to_float() to ac_llvm_build.c > ac: remove bitcast_to_float() > ac: fix ac_get_type_size() for doubles > ac: add support for SPV_AMD_shader_ballot > ac/nir: add support for SPV_AMD_shader_ballot > radv: enable VK_AMD_shader_ballot > ac/nir: fix saturate emission > > src/amd/common/ac_llvm_build.c | 783 > ++++++++++++++++++++- > src/amd/common/ac_llvm_build.h | 120 ++++ > src/amd/common/ac_nir_to_llvm.c | 300 ++++---- > src/amd/vulkan/radv_device.c | 15 + > src/amd/vulkan/radv_pipeline.c | 6 + > src/compiler/Makefile.sources | 4 +- > src/compiler/nir/nir.h | 11 +- > src/compiler/nir/nir_intrinsics.h | 124 +++- > ...scalar.c => nir_lower_cross_thread_to_scalar.c} | 63 +- > src/compiler/nir/nir_lower_group_reduce.c | 179 +++++ > src/compiler/nir/nir_print.c | 1 + > src/compiler/spirv/GLSL.ext.AMD.h | 93 +++ > src/compiler/spirv/nir_spirv.h | 2 + > src/compiler/spirv/spirv_to_nir.c | 32 +- > src/compiler/spirv/vtn_amd.c | 281 ++++++++ > src/compiler/spirv/vtn_private.h | 9 + > src/intel/compiler/brw_nir.c | 2 +- > 17 files changed, 1846 insertions(+), 179 deletions(-) > rename src/compiler/nir/{nir_lower_read_invocation_to_scalar.c => > nir_lower_cross_thread_to_scalar.c} (56%) > create mode 100644 src/compiler/nir/nir_lower_group_reduce.c > create mode 100644 src/compiler/spirv/GLSL.ext.AMD.h > create mode 100644 src/compiler/spirv/vtn_amd.c > > -- > 2.9.4 > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev