This series adds some optimizations on variables to try and help shaders with indirects where we can't just throw the variables away and use SSA. The particular motivation of this series is the tessellation control shaders in Batman: Arkham City as translated by DXVK. When DXVK translates a tessellation shader, it's common to see this pattern:
layout(location=0) in vec3 v0[3]; layout(location=0) in vec2 v1[3]; layout(location=0) out vec4 oVertex[3][32]; vec4 shader_in[3][32]; void hs_main () { oVertex[gl_InvocationId][0].xyz = shader_in[gl_InvocationId][0].xyz; oVertex[gl_InvocationId][1].xy = shader_in[gl_InvocationId][1].xy; // Do some other stuff } void main () { shader_in[0][0].xyz = v0[0]; shader_in[1][0].xyz = v0[1]; shader_in[2][0].xyz = v0[2]; shader_in[0][1].xyz = v1[0]; shader_in[1][1].xyz = v1[1]; shader_in[2][1].xyz = v1[2]; hs_main(); } Having that shader_in temporary array is currently stops NIR's optimization ability dead. In anv, we end up generating a shader that first loads all of the inputs into temporary storage and, because they are indirect, we generate if-ladders for the reads of shader_in. This isn't so bad in the above example, but Batman: Arkham City has tessellation control shaders with 8 inputs of 9 vertices each. That many vec4's works out to 4.5 KiB of data which is 9x the amount of storage we have per-thread in a SIMD8 shader so we end up spilling the whole lot. This series attempts to solve this problem (and others like it) by adding four optimizations: 1. Structure splitting. This isn't actually needed for this case since there are no structures but it's needed in order for the other passes to be more generally applicable. 2. Array splitting. This pass looks at something like the shader_in array above and determines that the second array index is only used directly and splits it into 32 arrays of vec4[3] and 30 of those arrays then get deleted because we never use them. 3. Vector narrowing. This pass looks at vectors or arrays of vectors and tries to determine if some of the channels are unused. It then shrinks the vector and reworks all the load/store operations to swizzle things appropriately for the smaller type. This way it can delete components from the middle of a vector. In the example above, it takes some of the new vec4[3] arrays created by array splitting and shrinks them to vec3[3] or vec2[3]. 4. Array copy detection. This is a peephole optimization that looks for a particular array copy pattern and turns it into a copy_deref intrinsic which copies the entire array. This is useful because copy_prop_vars can see through copy_deref intrinsics and turn indirect loads from the destination of the copy into an indirect load of the source. The end result of those four optimizations put together is that the above example now looks something like this (after function inlining and other optimizations): layout(location=0) in vec3 v0[3]; layout(location=0) in vec2 v1[3]; layout(location=0) out vec4 oVertex[3][32]; vec4 shader_in[3][32]; void main () { oVertex[gl_InvocationId][0].xyz = v0[gl_InvocationId].xyz; oVertex[gl_InvocationId][1].xy = v1[gl_InvocationId].xy; // Do some other stuff } and we can very nicely handle the indirect per-vertex loads in the back-end without the need for if-ladders. The end result is that the tessellation shaders in Batman: Arkham City no longer spill at all and are actually readable. Another side-effect of this series is that it potentially allows us to vastly simplify nir_lower_vars_to_ssa. Most of the complexity in the vars_to_ssa pass comes with trying to handle structures, arrays, potential aliasing, etc. If we run structure and array splitting prior to vars_to_ssa, we could make it only consider non-array vector or scalar variables and get exactly the same effect. Gone would be the pile of data structure that we build just to determine if a particular array dimension is indirected. This series can be found on my gitlab here: https://gitlab.freedesktop.org/jekstrand/mesa/commits/wip/nir-var-opts Cc: Timothy Arceri <tarc...@itsqueeze.com> Jason Ekstrand (12): util/list: Make some helpers take const lists nir: Take if uses into account in ssa_def_components_read nir/print: Remove a bogus assert nir/instr_set: Fix nir_instrs_equal for derefs nir/types: Add array_or_matrix helpers nir: Add a structure splitting pass nir: Add an array splitting pass intel/nir: Use the new structure and array splitting passes nir: Add a array-of-vector variable narrowing pass intel/nir: Use narrow_vec_vars nir: Add an array copy optimization intel/nir: Enable nir_opt_find_array_copies src/compiler/Makefile.sources | 2 + src/compiler/nir/meson.build | 2 + src/compiler/nir/nir.c | 3 + src/compiler/nir/nir.h | 5 + src/compiler/nir/nir_instr_set.c | 4 +- src/compiler/nir/nir_opt_find_array_copies.c | 376 ++++++ src/compiler/nir/nir_print.c | 1 - src/compiler/nir/nir_split_vars.c | 1219 ++++++++++++++++++ src/compiler/nir_types.cpp | 15 + src/compiler/nir_types.h | 2 + src/intel/compiler/brw_nir.c | 4 + src/util/list.h | 8 +- 12 files changed, 1634 insertions(+), 7 deletions(-) create mode 100644 src/compiler/nir/nir_opt_find_array_copies.c create mode 100644 src/compiler/nir/nir_split_vars.c -- 2.17.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev