On Thu, Jan 21, 2016 at 4:37 PM, Kenneth Graunke <kenn...@whitecape.org> wrote: > I don't know why, but we never hooked up this pass Eric wrote. > Otherwise, you can end up with stupid scalarized code such as: > > vec4 ssa_7 = load_const (0.0, 0.0, 0.0, 0.0) > vec4 ssa_8 = ... > vec1 ssa_9 = feq ssa_8, ssa_7 > vec1 ssa_10 = feq ssa_8.y, ssa_7.y > vec1 ssa_11 = feq ssa_8, ssa_7.z > vec1 ssa_12 = feq ssa_8.y, ssa_7.w > > ssa_8.xyxy == <0, 0, 0, 0> should only take two feq instructions. > > shader-db on Skylake: > > total instructions in shared programs: 9111788 -> 9111384 (-0.00%) > instructions in affected programs: 32421 -> 32017 (-1.25%) > helped: 277 > HURT: 69
All the hurt programs seem to have an extra instruction because of interactions with multiply-add fusing. What we have with this patch might even be better. > > total cycles in shared programs: 69221226 -> 69219394 (-0.00%) > cycles in affected programs: 917796 -> 915964 (-0.20%) > helped: 317 > HURT: 408 One weird thing here... ETQW/fp-259.shader_test goes from 54 -> 53 instructions (another multiply-add interaction) in both the SIMD8 and SIMD16 programs, but the cycle estimate goes from 422 -> 432 in SIMD8 and 392 -> 570 in SIMD16. There are four texture operations, and they're scheduled together in SIMD8. In SIMD16, for some reason the one that reads surface 2 is scheduled basically at the end of the program... Also, how in the world could the SIMD16 cycle estimate be *lower* than the SIMD8 cycle estimate? > > This also prevents regressions when disabling channel expressions. > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > --- > src/mesa/drivers/dri/i965/brw_nir.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c > b/src/mesa/drivers/dri/i965/brw_nir.c > index 935529a..ce9b9db 100644 > --- a/src/mesa/drivers/dri/i965/brw_nir.c > +++ b/src/mesa/drivers/dri/i965/brw_nir.c > @@ -482,6 +482,11 @@ brw_preprocess_nir(nir_shader *nir, bool is_scalar) > > nir = nir_optimize(nir, is_scalar); > > + if (is_scalar) { > + OPT_V(nir_lower_load_const_to_scalar); > + OPT(nir_opt_cse); Did you find the call to nir_opt_cse to be necessary? Removing it, I only see the cycle estimate for trine-2/fp-2 go from 696 -> 704. I'd probably leave it out. Reviewed-by: Matt Turner <matts...@gmail.com> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev