On Tue, Nov 06, 2018 at 10:45:52AM +0100, Connor Abbott wrote: > As far as I understand, mediump handling can be split into two parts: > > 1. Figuring out which operations (instructions or SSA values in NIR) > can use relaxed precision. > 2. Deciding which relaxed-precision operations to actually compute in > 16-bit precision. > > At least for GLSL, #1 is pretty well nailed down by the GLSL spec, > where it's specified in terms of the source expressions. For example, > something like: > > mediump float a = ...; > mediump float b = ...; > float c = a + b; > float d = c + 2.0; > > the last addition must be performed in full precision, whereas for: > > > mediump float a = ...; > mediump float b = ...; > float d = (a + b) + 2.0; > > it can be lowered to 16-bit. This information gets lost during > expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the > AST -> GLSL IR transform will sometimes split up expressions, so it > seems like both are too low-level for this. The analysis described by > the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the > GLSL ES 3.20 spec) has to happen on the AST after type checking but > before lowering to GLSL IR in order to be correct and not overly > conservative. If you want to do it in NIR since #2 is easier with SSA, > then sure... but we can't mix them up and do both at the same time. > We'll have to add support for annotating ir_expression's and nir_instr > (or maybe nir_ssa_def's) with a relaxed precision, and filter that > information down through the pipeline. Hopefully that also works > better for SPIR-V, where you can annotate individual instructions as > being RelaxedPrecision, and afaik (hopefully) #1 is handled by > glslang.
I tried to describe the logic I used and my interpretation of the spec in the accompanying patch: https://lists.freedesktop.org/archives/mesa-dev/2018-November/208683.html Does it make any sense? > > > On Tue, Nov 6, 2018 at 7:30 AM Topi Pohjolainen > <topi.pohjolai...@gmail.com> wrote: > > > > Here is a version 2 of adding support for 16-bit float instructions in > > the shader compiler. Unlike the first version which did all the analysis > > at glsl level here one adds the notion of precision to NIR variables and > > does the analysis and precision lowering in NIR level. > > > > This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16. > > > > This is now mature enough to be able to use 16-bit precision for all > > instructions except a few special cases for gfxbench trex and alu2. > > (Unfortunately I'm not seeing any performance benefit. This is not > > that surprising as I got to the same point with the glsl-based > > solution and was able to measure the performance already back then). > > Hence I thought it is time to share it. > > > > While this is still work-in-progress I didn't want to flood the list > > with the full set of patches but instead included the very last where > > I try to outline the logic and its current shortcomings. There is also > > a short list of TODO items. > > > > In addition to those I need to examine couple of Intel specific > > misrenderings. I haven't gotten that deep yet but it looks I'm missing > > something with 16-bit inot and mad/mac lowered interpolation. > > Unfortunately I get corrupted rendering only with hardware while > > simulator is happy. > > > > Mostly I'm afraid how to test all of this properly. I haven't written > > any unit tests but that is high on my list. This is mostly because I've > > been uncertain about my design choices. So far I've used shader > > runner tests that I've written for specific cases. These are useful for > > development purposes but don't bring much value for regression testing. > > > > Alejandro PiƱeiro (1): > > intel/compiler/fs: Use half_precision data_format on 16-bit fb writes > > > > Jose Maria Casanova Crespo (2): > > intel/compiler/fs: Include support for RT data_format bit > > intel/compiler/disasm: Show half-precision data_format on rt_writes > > > > Topi Pohjolainen (58): > > intel/compiler/fs: Set 16-bit sampler return format > > intel/compiler/disasm: Show half-precision for sampler messages > > intel/compiler/fs: Skip tex-inst early in conversion lowering > > intel/compiler/fs: Support for dumping 16-bit IMM values > > intel/compiler: Allow 16-bit math > > intel/compiler/fs: Add helpers for 16-bit null regs > > intel/compiler/fs: Use two SIMD8 instructions for 16-bit math > > intel/compiler/fs: Use 16-bit null dest with 16-bit math > > intel/compiler/fs: Use 16-bit null dest with 16-bit compare > > intel/compiler/fs: Add 16-bit type support for nir_if > > intel/compiler/eu: Prepare 3-src-op for 16-bit sources > > intel/compiler/eu: Prepare 3-src-op for 16-bit dst > > intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F) sources > > intel/compiler/disasm: Print mixed precision 3-src types correctly > > intel/compiler/disasm: Print 16-bit IMM values > > intel/compiler/fs: Support for combining 16-bit immediates > > intel/compiler/fs: Set tex type for generator to flag fp16 > > intel/compiler/fs: Use component_size() instead of open coded > > intel/compiler/fs: Add register padding support > > intel/compiler/fs: Pad 16-bit texture return payloads > > intel/compiler/fs: Pad 16-bit output (store/fb write) payloads > > intel/compiler/fs: Pad 16-bit nir vec* components into full reg > > intel/compiler/fs: Pad 16-bit nir intrinsic dest into full reg > > intel/compiler/fs: Pad 16-bit const loads into full regs > > intel/compiler/fs: Pad 16-bit load payload lowering > > nir: Lower also 16-bit lrp() if needed > > intel/compiler: Lower 16-bit lrp() > > nir: Recognize f232(f216(x)) as x > > nir: Recognize f216(f232(x)) as x > > nir: Store variable precision when translating from glsl > > glsl: Set default precision for builtin variables > > i965: Prepare uniform mapping for 16-bit values > > i965: Support for uploading 16-bit uniforms from 32-bit store > > intel/compiler/fs: WIP: Use 32-bit slots for 16-bit uniforms > > intel/compiler: Tell compiler if lower precision is supported > > nir: Add lowering pass for variables marked mediump > > nir: Add pass for deref precision lowering > > nir: Add pass for alu precision lowering > > nir: Add precision conversion for load/store_deref > > nir: Add precision conversion for sources of texturing ops > > nir: Don't set destination size 16 for booleans > > nir: Add precision lowering for texture samples > > nir: Add support for non-fixed precision > > nir: Don't try to alter precision of boolean sources > > nir: Add support for variable sized booleans > > nir: Add support for lowering phi precision > > intel/compiler/fs: Prepare alu dest type for 16-bit booleans > > nir: Add lowering pass setting 16-bit boolean destinations > > nir: Add lowering pass turning b2f(i2i32(x)) into b2f(x) > > nir: Adjust integer precision for alus operating with 16-bit srcs > > nir: Replace b2f(x) with b2f(i2i32(x)) for 16-bit x > > nir: Adjust precision for discard_if > > nir: Allow input varyings to be converted to lower precision > > nir: Replace 16-bit src[0] for bcsel i2i32(src[0]) > > nir: Replace 16-bit nir_if condition with i2i32(condition) > > Revert "intel/compiler: fix 16-bit comparisons" > > intel/compiler: Hook in precision lowering pass > > nir: Document precision lowering pass > > > > src/compiler/Makefile.sources | 2 + > > src/compiler/glsl/glsl_symbol_table.cpp | 20 + > > src/compiler/glsl/glsl_symbol_table.h | 7 + > > src/compiler/glsl/glsl_to_nir.cpp | 1 + > > src/compiler/nir/meson.build | 2 + > > src/compiler/nir/nir.h | 18 + > > src/compiler/nir/nir_lower_bool_size.c | 120 +++ > > src/compiler/nir/nir_lower_precision.cpp | 820 ++++++++++++++++++ > > src/compiler/nir/nir_opt_algebraic.py | 5 + > > src/intel/blorp/blorp.c | 4 +- > > src/intel/compiler/brw_compiler.c | 1 + > > src/intel/compiler/brw_disasm.c | 28 +- > > src/intel/compiler/brw_eu.h | 3 +- > > src/intel/compiler/brw_eu_emit.c | 83 +- > > src/intel/compiler/brw_fs.cpp | 68 +- > > src/intel/compiler/brw_fs.h | 4 +- > > src/intel/compiler/brw_fs_builder.h | 37 +- > > .../compiler/brw_fs_combine_constants.cpp | 84 +- > > .../compiler/brw_fs_copy_propagation.cpp | 7 +- > > src/intel/compiler/brw_fs_generator.cpp | 13 +- > > .../compiler/brw_fs_lower_conversions.cpp | 42 + > > src/intel/compiler/brw_fs_nir.cpp | 197 +++-- > > src/intel/compiler/brw_fs_surface_builder.cpp | 3 +- > > src/intel/compiler/brw_fs_visitor.cpp | 6 + > > src/intel/compiler/brw_inst.h | 5 + > > src/intel/compiler/brw_ir_fs.h | 16 + > > src/intel/compiler/brw_nir.c | 22 +- > > src/intel/compiler/brw_nir.h | 4 +- > > src/intel/compiler/brw_reg_type.c | 2 + > > src/intel/compiler/brw_shader.h | 7 + > > src/intel/vulkan/anv_pipeline.c | 2 +- > > .../drivers/dri/i965/brw_nir_uniforms.cpp | 8 +- > > src/mesa/drivers/dri/i965/brw_program.c | 10 +- > > src/mesa/drivers/dri/i965/brw_program.h | 6 +- > > src/mesa/drivers/dri/i965/brw_tcs.c | 2 +- > > .../drivers/dri/i965/gen6_constant_state.c | 14 +- > > 36 files changed, 1548 insertions(+), 125 deletions(-) > > create mode 100644 src/compiler/nir/nir_lower_bool_size.c > > create mode 100644 src/compiler/nir/nir_lower_precision.cpp > > > > -- > > 2.17.1 > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev