On Mon, Feb 22, 2016 at 10:14 PM, Ian Romanick <i...@freedesktop.org> wrote: > On 02/22/2016 04:13 PM, Matt Turner wrote: >> total instructions in shared programs: 7127270 -> 7103195 (-0.34%) >> instructions in affected programs: 1376832 -> 1352757 (-1.75%) >> helped: 7394 >> HURT: 622 >> >> GAINED: 4 >> LOST: 2 >> --- >> src/compiler/nir/nir_opt_algebraic.py | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py >> b/src/compiler/nir/nir_opt_algebraic.py >> index cc2c229..1863033 100644 >> --- a/src/compiler/nir/nir_opt_algebraic.py >> +++ b/src/compiler/nir/nir_opt_algebraic.py >> @@ -227,6 +227,10 @@ optimizations = [ >> (('fabs', ('fsub', 0.0, a)), ('fabs', a)), >> (('iabs', ('isub', 0, a)), ('iabs', a)), >> >> + # Propagate negation up multiplication chains >> + (('fmul', ('fneg', a), b), ('fneg', ('fmul', a, b))), >> + (('imul', ('ineg', a), b), ('ineg', ('fmul', a, b))), >> + > > We had tried things like this before, and it wasn't clearly a win. Is > it a win now because of your early change to try to match a larger > expression? My recollection is that doing this thwarted some attempts > at generating MAD.
I think that was before NIR, and I committed some patches like commit 3d581f99963dea7e93a2f8fd819410da02c1cb7f Author: Matt Turner <matts...@gmail.com> Date: Fri Dec 19 21:35:56 2014 -0800 i965/vec4: Emit MADs from (x + -(y * z)). commit c4fab711ed5bbdb6b8421a1b980215032fc795b8 Author: Matt Turner <matts...@gmail.com> Date: Fri Dec 19 21:30:16 2014 -0800 i965/fs: Emit MADs from (x + -(y * z)). to solve it. I believe NIR recognizes this case already (see brw_nir_opt_peephole_ffma.c in i965/) > Should we also add rules like > > (('ffma', ('fneg', a), b, c), ('fadd', ('fneg', ('fmul', a, b)), c), > 'options->lower_ffma'), We actually split all ffma operations, optimize, and only as a last step recombine multiplies and adds into ffma, so I don't think this can change anything. I tested it, and it gave no changes in shader-db. > Finally, have you looked at the hurt cases? I also seem to recall that > cases where one of the operands of the fmul was an immediate were > usually better off moving the negation from the I spot checked some, and they seem to match what I remember from when I tried this before. For instance: -mul.sat(8) g18<1>F g15<8,8,1>F g33<8,8,1>F -mul.sat(8) g22<1>F -g15<8,8,1>F g33<8,8,1>F -mul.sat(8) g20<1>F g17<8,8,1>F g33<8,8,1>F -mul.sat(8) g24<1>F -g17<8,8,1>F g33<8,8,1>F +mul(8) g18<1>F g15<8,8,1>F g35<8,8,1>F +mul(8) g21<1>F g17<8,8,1>F g35<8,8,1>F +mov.sat(8) g19<1>F g18<8,8,1>F +mov.sat(8) g24<1>F -g18<8,8,1>F +mov.sat(8) g22<1>F g21<8,8,1>F +mov.sat(8) g26<1>F -g21<8,8,1>F You'd rather have the 4x mul.sats, but that means finding a way of preventing CSE from combining them which I think requires more information than contained in the multiply instructions themselves. Solving this would be good though -- Connor encountered a similar problem (or maybe the same) with his NIR vectorizer series. I've pushed the first four patches. I'm not sure if we should hold off on this one or not. Solving the previously mentioned problem will be a lot simpler with UD chains... hopefully soon. Preferences? _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev