There are a bunch of instructions emitted on ir3_compiler_nir related to offset computations for IO opcodes (ssbo, image, etc). This small series explores the possibility of moving these instructions to NIR, where we have higher chances of optimizing them.
The series introduces a new, freedreno specific NIR pass, 'ir3_nir_lower_sampler_io' (final name not set). The pass is executed early on ir3_optimize_nir(), and the goal is to centralize all these computations there, hoping that later NIR passes will produce better code than what is currently emitted. So far, we have just implemented byte-offset computation for image store and atomics. This seemed like a good first target given the amount of instructions being emitted for it by the backend. This is an RFC series because there are a few open questions, but we wanted to gather feedback already now, in case this effort is something not worth it; and also hoping that somebody else will give it a try against other shaders and on other gens (we have just tried this on a5xx). * We have so far been unable to see any improvement in generated code (not a penalty either). shader-db has not been specially useful. Few shaders there exercise image store or image atomic ops, and of those that do, most require higher versions of GLSL than what freedreno supports, so they get skipped. The few that do actually run, don't show any meaningful difference. Then other shaders picked from tests suites are simple enough not to produce any difference in code either. There is still on-going work looking for cases where the pass helps instruction stats, whether writing custom shaders or porting complex shader from shader-db to run on GLES 310. There is though an open question here as to whether moving backend code to NIR is a benefit in and of itself. * The series adds a nir_op_imad opcode that didn't exist before, and perhaps not generally useful even for freedreno outside this pass, because it maps to IR3_MAD_S24 which is probably not suitable for generic integer multiply-add. * The pass currently has 2 alternative code-paths to emit the multiplication by the bytes-per-pixel of an image format. In one case, since this value can be obtained at compile time, it is emitted as an immediate by nir_imul_imm. The other alternative is emitting an nir_imul with an SSA value that will map to image_dims[0] at shader runtime. The former case is uglier but produces better code (a single SHL instruction), whereas the latter involves a generic imul, for which the backend emits a lot of code to cover for overflow. The doubt here is whether we should introduce a (lower precision) version of imul that maps directly to IR3_IMUL_S. A live (WIP) tree of the series can be found at: <https://gitlab.freedesktop.org/elima/mesa/commits/wip/fd-compiler-io> We plan to continue moving computations to the pass if we see good opportunities. Feedback very welcome, cheers, Eduardo Eduardo Lima Mitev (4): nir: Add a new intrinsic 'load_image_stride' nir: Add a new ALU nir_op_imad ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io' ir3: Use ir3_nir_lower_sampler_io pass src/compiler/nir/nir_intrinsics.py | 2 + src/compiler/nir/nir_opcodes.py | 1 + src/freedreno/Makefile.sources | 1 + src/freedreno/ir3/ir3_compiler_nir.c | 61 ++-- src/freedreno/ir3/ir3_nir.c | 1 + src/freedreno/ir3/ir3_nir.h | 1 + src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++++++++++++++++++ 7 files changed, 383 insertions(+), 33 deletions(-) create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c -- 2.20.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev