On Tue, Nov 06, 2018 at 02:08:32PM +0100, Connor Abbott wrote: > On Tue, Nov 6, 2018 at 1:45 PM Pohjolainen, Topi > <topi.pohjolai...@gmail.com> wrote: > > > > On Tue, Nov 06, 2018 at 11:31:58AM +0100, Connor Abbott wrote: > > > On Tue, Nov 6, 2018 at 11:14 AM Pohjolainen, Topi > > > <topi.pohjolai...@gmail.com> wrote: > > > > > > > > On Tue, Nov 06, 2018 at 10:45:52AM +0100, Connor Abbott wrote: > > > > > As far as I understand, mediump handling can be split into two parts: > > > > > > > > > > 1. Figuring out which operations (instructions or SSA values in NIR) > > > > > can use relaxed precision. > > > > > 2. Deciding which relaxed-precision operations to actually compute in > > > > > 16-bit precision. > > > > > > > > > > At least for GLSL, #1 is pretty well nailed down by the GLSL spec, > > > > > where it's specified in terms of the source expressions. For example, > > > > > something like: > > > > > > > > > > mediump float a = ...; > > > > > mediump float b = ...; > > > > > float c = a + b; > > > > > float d = c + 2.0; > > > > > > > > > > the last addition must be performed in full precision, whereas for: > > > > > > > > > > > > > > > mediump float a = ...; > > > > > mediump float b = ...; > > > > > float d = (a + b) + 2.0; > > > > > > > > > > it can be lowered to 16-bit. This information gets lost during > > > > > expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the > > > > > AST -> GLSL IR transform will sometimes split up expressions, so it > > > > > seems like both are too low-level for this. The analysis described by > > > > > the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the > > > > > GLSL ES 3.20 spec) has to happen on the AST after type checking but > > > > > before lowering to GLSL IR in order to be correct and not overly > > > > > conservative. If you want to do it in NIR since #2 is easier with SSA, > > > > > then sure... but we can't mix them up and do both at the same time. > > > > > We'll have to add support for annotating ir_expression's and nir_instr > > > > > (or maybe nir_ssa_def's) with a relaxed precision, and filter that > > > > > information down through the pipeline. Hopefully that also works > > > > > better for SPIR-V, where you can annotate individual instructions as > > > > > being RelaxedPrecision, and afaik (hopefully) #1 is handled by > > > > > glslang. > > > > > > > > I tried to describe the logic I used and my interpretation of the spec > > > > in > > > > the accompanying patch: > > > > > > > > https://lists.freedesktop.org/archives/mesa-dev/2018-November/208683.html > > > > > > > > Does it make any sense? > > > > > > It seems incorrect, since it will make the addition in my example > > > operate in 16 bit precision when it shouldn't. As I explained above, > > > it's impossible to do this correctly in NIR. > > > > > > Also, abusing a 16-bit bitsize in NIR to mean mediump is not ok. There > > > are other vulkan/glsl extensions out there that provide actual fp16 > > > support, where the result is guaranteed to be calculated as a > > > half-float, and these obviously won't work properly with this pass. We > > > need to add a flag to the SSA def, or Jason's idea a long time ago was > > > to add a fake "24-bit" bitsize. Part of #2 will involve converting the > > > bitsize to be 16-bit and removing the flag. > > > > I wrote small test shader-runner test: > > > > [require] > > GL ES >= 2.0 > > GLSL ES >= 1.00 > > > > [vertex shader] > > #version 100 > > > > precision highp float; > > > > attribute vec4 piglit_vertex; > > > > void main() > > { > > gl_Position = piglit_vertex; > > } > > > > [fragment shader] > > #version 100 > > precision highp float; > > > > uniform mediump float a; > > uniform mediump float b; > > > > void main() > > { > > float c = a + b; > > float d = c + 0.4; > > > > gl_FragColor = vec4(a, b, c, d); > > } > > > > [test] > > uniform float a 0.1 > > uniform float b 0.2 > > draw rect -1 -1 2 2 > > probe all rgba 0.1 0.2 0.3 0.7 > > > > > > And that made me realize another short-coming in my implementation - I > > lowered > > variable precision after running nir_lower_var_copies() loosing precision > > for > > local temporaries. Moving it before gave me: > > > > NIR (final form) for fragment shader: > > shader: MESA_SHADER_FRAGMENT > > name: GLSL3 > > inputs: 0 > > outputs: 0 > > uniforms: 8 > > shared: 0 > > decl_var uniform INTERP_MODE_NONE float16_t a (0, 0, 0) > > decl_var uniform INTERP_MODE_NONE float16_t b (1, 4, 0) > > decl_var shader_out INTERP_MODE_NONE vec4 gl_FragColor (FRAG_RESULT_COLOR, > > 4, 0) > > decl_function main (0 params) > > > > impl main { > > block block_0: > > /* preds: */ > > vec1 32 ssa_0 = load_const (0x3ecccccd /* 0.400000 */) > > vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) > > vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 4) /* base=0 */ > > /* range=4 */ /* a */ > > vec1 16 ssa_3 = intrinsic load_uniform (ssa_1) (4, 4) /* base=4 */ > > /* range=4 */ /* b */ > > vec1 16 ssa_4 = fadd ssa_2, ssa_3 > > vec1 32 ssa_5 = f2f32 ssa_4 > > vec1 32 ssa_6 = f2f32 ssa_2 > > vec1 32 ssa_7 = f2f32 ssa_3 > > vec1 32 ssa_8 = fadd ssa_5, ssa_0 > > vec4 32 ssa_9 = vec4 ssa_6, ssa_7, ssa_5, ssa_8 > > intrinsic store_output (ssa_9, ssa_1) (4, 15, 0) /* base=4 */ /* > > wrmask=xyzw */ /* component=0 */ /* gl_FragColor */ > > /* succs: block_1 */ > > block block_1: > > } > > > > > > which looks to do the right thing. > > Your piglit test uses c in the output, preventing GLSL IR tree > grafting from kicking in. If you remove that, then it'll be broken > again since GLSL IR will rewrite it to a single expression before you > get a chance to see it.
Right, after replacing gl_FragColor = vec4(a, b, c, d); with gl_FragColor = vec4(a, b, b, d); I still got the right end result: NIR (SSA form) for fragment shader: shader: MESA_SHADER_FRAGMENT name: GLSL3 inputs: 0 outputs: 0 uniforms: 8 shared: 0 decl_var uniform INTERP_MODE_NONE float16_t a (0, 0, 0) decl_var uniform INTERP_MODE_NONE float16_t b (1, 4, 0) decl_var shader_out INTERP_MODE_NONE vec4 gl_FragColor (FRAG_RESULT_COLOR, 4, 0) decl_function main (0 params) impl main { block block_0: /* preds: */ vec1 32 ssa_0 = load_const (0x3ecccccd /* 0.400000 */) vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 4) /* base=0 */ /* range=4 */ /* a */ vec1 32 ssa_3 = f2f32 ssa_2 vec1 16 ssa_4 = intrinsic load_uniform (ssa_1) (4, 4) /* base=4 */ /* range=4 */ /* b */ vec1 32 ssa_5 = f2f32 ssa_4 vec1 16 ssa_6 = fadd ssa_2, ssa_4 vec1 32 ssa_7 = f2f32 ssa_6 vec1 32 ssa_8 = fadd ssa_7, ssa_0 vec4 32 ssa_9 = vec4 ssa_3, ssa_5, ssa_5, ssa_8 intrinsic store_output (ssa_9, ssa_1) (4, 15, 0) /* base=4 */ /* wrmask=xyzw */ /* component=0 */ /* gl_FragColor */ /* succs: block_1 */ block block_1: } But I think this is just an example where it happens to work. I need to think about this some more... > > The GLSL spec really wants you to think of operations like plus, > times, etc. as having a precision qualifier attached to them, as well > as input/output variables. This is also how SPIR-V works. The > propagation algorithm is just a small syntactic sugar, so the deeper > you push it down the pipeline, the more whack-a-mole games like this > you're going to have to play, and if you have to start disabling > optimizations like tree grafting, or you have to choose the final > bitsize before you have all the information SSA gives you (like moving > it before nir_lower_var_copies() and nir_vars_to_ssa()), then > performance will be worse. > > > > > > > Now, you may still be very correct in general that my approach is flawed. > > I've > > been hoping to work on item 1) in my TODO list (found in patch number 61 in > > the list). There has just been too many parts that I had to address > > (including > > Intel backend) to get performance numbers for benchmarks. > > > > I'm not very good at saying on paper if something works or not - I rather > > try > > it out to understand it better. My idea is to use deref loads and stores as > > recursion starting points. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev