Re: [Mesa-dev] intel: WIP: Support for using 16-bits for mediump

Connor Abbott Tue, 06 Nov 2018 02:32:47 -0800

On Tue, Nov 6, 2018 at 11:14 AM Pohjolainen, Topi
<topi.pohjolai...@gmail.com> wrote:
>
> On Tue, Nov 06, 2018 at 10:45:52AM +0100, Connor Abbott wrote:
> > As far as I understand, mediump handling can be split into two parts:
> >
> > 1. Figuring out which operations (instructions or SSA values in NIR)
> > can use relaxed precision.
> > 2. Deciding which relaxed-precision operations to actually compute in
> > 16-bit precision.
> >
> > At least for GLSL, #1 is pretty well nailed down by the GLSL spec,
> > where it's specified in terms of the source expressions. For example,
> > something like:
> >
> > mediump float a = ...;
> > mediump float b = ...;
> > float c = a + b;
> > float d = c + 2.0;
> >
> > the last addition must be performed in full precision, whereas for:
> >
> >
> > mediump float a = ...;
> > mediump float b = ...;
> > float d = (a + b) + 2.0;
> >
> > it can be lowered to 16-bit. This information gets lost during
> > expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the
> > AST -> GLSL IR transform will sometimes split up expressions, so it
> > seems like both are too low-level for this. The analysis described by
> > the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the
> > GLSL ES 3.20 spec) has to happen on the AST after type checking but
> > before lowering to GLSL IR in order to be correct and not overly
> > conservative. If you want to do it in NIR since #2 is easier with SSA,
> > then sure... but we can't mix them up and do both at the same time.
> > We'll have to add support for annotating ir_expression's and nir_instr
> > (or maybe nir_ssa_def's) with a relaxed precision, and filter that
> > information down through the pipeline. Hopefully that also works
> > better for SPIR-V, where you can annotate individual instructions as
> > being RelaxedPrecision, and afaik (hopefully) #1 is handled by
> > glslang.
>
> I tried to describe the logic I used and my interpretation of the spec in
> the accompanying patch:
>
> https://lists.freedesktop.org/archives/mesa-dev/2018-November/208683.html
>
> Does it make any sense?


It seems incorrect, since it will make the addition in my example
operate in 16 bit precision when it shouldn't. As I explained above,
it's impossible to do this correctly in NIR.

Also, abusing a 16-bit bitsize in NIR to mean mediump is not ok. There
are other vulkan/glsl extensions out there that provide actual fp16
support, where the result is guaranteed to be calculated as a
half-float, and these obviously won't work properly with this pass. We
need to add a flag to the SSA def, or Jason's idea a long time ago was
to add a fake "24-bit" bitsize. Part of #2 will involve converting the
bitsize to be 16-bit and removing the flag.

>
> >
> >
> > On Tue, Nov 6, 2018 at 7:30 AM Topi Pohjolainen
> > <topi.pohjolai...@gmail.com> wrote:
> > >
> > > Here is a version 2 of adding support for 16-bit float instructions in
> > > the shader compiler. Unlike the first version which did all the analysis
> > > at glsl level here one adds the notion of precision to NIR variables and
> > > does the analysis and precision lowering in NIR level.
> > >
> > > This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16.
> > >
> > > This is now mature enough to be able to use 16-bit precision for all
> > > instructions except a few special cases for gfxbench trex and alu2.
> > > (Unfortunately I'm not seeing any performance benefit. This is not
> > > that surprising as I got to the same point with the glsl-based
> > > solution and was able to measure the performance already back then).
> > > Hence I thought it is time to share it.
> > >
> > > While this is still work-in-progress I didn't want to flood the list
> > > with the full set of patches but instead included the very last where
> > > I try to outline the logic and its current shortcomings. There is also
> > > a short list of TODO items.
> > >
> > > In addition to those I need to examine couple of Intel specific
> > > misrenderings. I haven't gotten that deep yet but it looks I'm missing
> > > something with 16-bit inot and mad/mac lowered interpolation.
> > > Unfortunately I get corrupted rendering only with hardware while
> > > simulator is happy.
> > >
> > > Mostly I'm afraid how to test all of this properly. I haven't written
> > > any unit tests but that is high on my list. This is mostly because I've
> > > been uncertain about my design choices. So far I've used shader
> > > runner tests that I've written for specific cases. These are useful for
> > > development purposes but don't bring much value for regression testing.
> > >
> > > Alejandro Piñeiro (1):
> > >   intel/compiler/fs: Use half_precision data_format on 16-bit fb writes
> > >
> > > Jose Maria Casanova Crespo (2):
> > >   intel/compiler/fs: Include support for RT data_format bit
> > >   intel/compiler/disasm: Show half-precision data_format on rt_writes
> > >
> > > Topi Pohjolainen (58):
> > >   intel/compiler/fs: Set 16-bit sampler return format
> > >   intel/compiler/disasm: Show half-precision for sampler messages
> > >   intel/compiler/fs: Skip tex-inst early in conversion lowering
> > >   intel/compiler/fs: Support for dumping 16-bit IMM values
> > >   intel/compiler: Allow 16-bit math
> > >   intel/compiler/fs: Add helpers for 16-bit null regs
> > >   intel/compiler/fs: Use two SIMD8 instructions for 16-bit math
> > >   intel/compiler/fs: Use 16-bit null dest with 16-bit math
> > >   intel/compiler/fs: Use 16-bit null dest with 16-bit compare
> > >   intel/compiler/fs: Add 16-bit type support for nir_if
> > >   intel/compiler/eu: Prepare 3-src-op for 16-bit sources
> > >   intel/compiler/eu: Prepare 3-src-op for 16-bit dst
> > >   intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F) sources
> > >   intel/compiler/disasm: Print mixed precision 3-src types correctly
> > >   intel/compiler/disasm: Print 16-bit IMM values
> > >   intel/compiler/fs: Support for combining 16-bit immediates
> > >   intel/compiler/fs: Set tex type for generator to flag fp16
> > >   intel/compiler/fs: Use component_size() instead of open coded
> > >   intel/compiler/fs: Add register padding support
> > >   intel/compiler/fs: Pad 16-bit texture return payloads
> > >   intel/compiler/fs: Pad 16-bit output (store/fb write) payloads
> > >   intel/compiler/fs: Pad 16-bit nir vec* components into full reg
> > >   intel/compiler/fs: Pad 16-bit nir intrinsic dest into full reg
> > >   intel/compiler/fs: Pad 16-bit const loads into full regs
> > >   intel/compiler/fs: Pad 16-bit load payload lowering
> > >   nir: Lower also 16-bit lrp() if needed
> > >   intel/compiler: Lower 16-bit lrp()
> > >   nir: Recognize f232(f216(x)) as x
> > >   nir: Recognize f216(f232(x)) as x
> > >   nir: Store variable precision when translating from glsl
> > >   glsl: Set default precision for builtin variables
> > >   i965: Prepare uniform mapping for 16-bit values
> > >   i965: Support for uploading 16-bit uniforms from 32-bit store
> > >   intel/compiler/fs: WIP: Use 32-bit slots for 16-bit uniforms
> > >   intel/compiler: Tell compiler if lower precision is supported
> > >   nir: Add lowering pass for variables marked mediump
> > >   nir: Add pass for deref precision lowering
> > >   nir: Add pass for alu precision lowering
> > >   nir: Add precision conversion for load/store_deref
> > >   nir: Add precision conversion for sources of texturing ops
> > >   nir: Don't set destination size 16 for booleans
> > >   nir: Add precision lowering for texture samples
> > >   nir: Add support for non-fixed precision
> > >   nir: Don't try to alter precision of boolean sources
> > >   nir: Add support for variable sized booleans
> > >   nir: Add support for lowering phi precision
> > >   intel/compiler/fs: Prepare alu dest type for 16-bit booleans
> > >   nir: Add lowering pass setting 16-bit boolean destinations
> > >   nir: Add lowering pass turning b2f(i2i32(x)) into b2f(x)
> > >   nir: Adjust integer precision for alus operating with 16-bit srcs
> > >   nir: Replace b2f(x) with b2f(i2i32(x)) for 16-bit x
> > >   nir: Adjust precision for discard_if
> > >   nir: Allow input varyings to be converted to lower precision
> > >   nir: Replace 16-bit src[0] for bcsel i2i32(src[0])
> > >   nir: Replace 16-bit nir_if condition with i2i32(condition)
> > >   Revert "intel/compiler: fix 16-bit comparisons"
> > >   intel/compiler: Hook in precision lowering pass
> > >   nir: Document precision lowering pass
> > >
> > >  src/compiler/Makefile.sources                 |   2 +
> > >  src/compiler/glsl/glsl_symbol_table.cpp       |  20 +
> > >  src/compiler/glsl/glsl_symbol_table.h         |   7 +
> > >  src/compiler/glsl/glsl_to_nir.cpp             |   1 +
> > >  src/compiler/nir/meson.build                  |   2 +
> > >  src/compiler/nir/nir.h                        |  18 +
> > >  src/compiler/nir/nir_lower_bool_size.c        | 120 +++
> > >  src/compiler/nir/nir_lower_precision.cpp      | 820 ++++++++++++++++++
> > >  src/compiler/nir/nir_opt_algebraic.py         |   5 +
> > >  src/intel/blorp/blorp.c                       |   4 +-
> > >  src/intel/compiler/brw_compiler.c             |   1 +
> > >  src/intel/compiler/brw_disasm.c               |  28 +-
> > >  src/intel/compiler/brw_eu.h                   |   3 +-
> > >  src/intel/compiler/brw_eu_emit.c              |  83 +-
> > >  src/intel/compiler/brw_fs.cpp                 |  68 +-
> > >  src/intel/compiler/brw_fs.h                   |   4 +-
> > >  src/intel/compiler/brw_fs_builder.h           |  37 +-
> > >  .../compiler/brw_fs_combine_constants.cpp     |  84 +-
> > >  .../compiler/brw_fs_copy_propagation.cpp      |   7 +-
> > >  src/intel/compiler/brw_fs_generator.cpp       |  13 +-
> > >  .../compiler/brw_fs_lower_conversions.cpp     |  42 +
> > >  src/intel/compiler/brw_fs_nir.cpp             | 197 +++--
> > >  src/intel/compiler/brw_fs_surface_builder.cpp |   3 +-
> > >  src/intel/compiler/brw_fs_visitor.cpp         |   6 +
> > >  src/intel/compiler/brw_inst.h                 |   5 +
> > >  src/intel/compiler/brw_ir_fs.h                |  16 +
> > >  src/intel/compiler/brw_nir.c                  |  22 +-
> > >  src/intel/compiler/brw_nir.h                  |   4 +-
> > >  src/intel/compiler/brw_reg_type.c             |   2 +
> > >  src/intel/compiler/brw_shader.h               |   7 +
> > >  src/intel/vulkan/anv_pipeline.c               |   2 +-
> > >  .../drivers/dri/i965/brw_nir_uniforms.cpp     |   8 +-
> > >  src/mesa/drivers/dri/i965/brw_program.c       |  10 +-
> > >  src/mesa/drivers/dri/i965/brw_program.h       |   6 +-
> > >  src/mesa/drivers/dri/i965/brw_tcs.c           |   2 +-
> > >  .../drivers/dri/i965/gen6_constant_state.c    |  14 +-
> > >  36 files changed, 1548 insertions(+), 125 deletions(-)
> > >  create mode 100644 src/compiler/nir/nir_lower_bool_size.c
> > >  create mode 100644 src/compiler/nir/nir_lower_precision.cpp
> > >
> > > --
> > > 2.17.1
> > >
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] intel: WIP: Support for using 16-bits for mediump

Reply via email to