compiler: Optimize sign(x)*y

Ian Romanick Mon, 10 Sep 2018 16:29:39 -0700

This series implements a code-generation optimization for sign(x)*y.  In
GLSL, sign(x) is defined as:


    Returns 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0.

It is silent on the NaN behavior, so I have taken it as "undefined."  I
don't think the new implementation will produce different results from
the old.

The optimization is only applied to the scalar backend.  On Skylake,
there are ~1,000 shaders in VS, TCS, and TES stages helped.  It may be
worth applying this to the vector backend for Haswell.  I have a couple
long flights in my near future, so I might work on it then.  We'll see.
This might also be a good newbie projet for someone wanting to get into
the i965 compiler backend.

There are actually two versions of this series.  The series that I am
sending to the list includes "i965/fs: Eliminate dead code first".  The
results of that patch is not good.  The other version of the series
omits that patch, but it adds a bunch of horror to "i965/fs: Add a scale
factor to emit_fsign".  Basically, if both the fused and non-fused
version of the nir_op_fsign are emitted, copy propagation will propagate
part of the common expressions, but, due to the predicated OR or XOR,
one extra MOV will be left around.  That single instruction ruins the
whole optimization.

Both versions are available in my cgit.  List version:

    https://cgit.freedesktop.org/~idr/mesa/log/?h=fsign-optimization

That branch includes a few things that I tried, but they did not pan
out.

Alternate version:

    
https://cgit.freedesktop.org/~idr/mesa/log/?h=fsign-optimization-emit-no-dead-code

I think the version sent to the list is cleaner, but it's shader-db
results are not as good.  The difference between the list version and
the other version on Skylake is shown below.  Other platforms had
similar shaped results.

total instructions in shared programs: 15090997 -> 15091028 (<.01%)
instructions in affected programs: 10251 -> 10282 (0.30%)
helped: 0
HURT: 26
HURT stats (abs)   min: 1 max: 4 x̄: 1.19 x̃: 1
HURT stats (rel)   min: 0.14% max: 1.96% x̄: 0.49% x̃: 0.24%
95% mean confidence interval for instructions value: 0.94 1.45
95% mean confidence interval for instructions %-change: 0.28% 0.71%
Instructions are HURT.

total cycles in shared programs: 565827580 -> 565824007 (<.01%)
cycles in affected programs: 1995745 -> 1992172 (-0.18%)
helped: 271
HURT: 248
helped stats (abs) min: 1 max: 623 x̄: 25.79 x̃: 5
helped stats (rel) min: 0.02% max: 13.19% x̄: 0.94% x̃: 0.28%
HURT stats (abs)   min: 1 max: 204 x̄: 13.78 x̃: 4
HURT stats (rel)   min: 0.01% max: 6.57% x̄: 0.52% x̃: 0.21%
95% mean confidence interval for cycles value: -11.25 -2.52
95% mean confidence interval for cycles %-change: -0.38% -0.11%
Cycles are helped.

The version sent to the list saves a couple instructions in 26 shaders,
but cycles are hurt.  The list version also avoids ~65 lines of ugly
code.

I also sent a couple tests to the piglit list that exersice a bug that I
had during development.

    https://patchwork.freedesktop.org/patch/247911/

In the alternate version, for an expression like sign(a)*sign(b),
sign(b) would never get emitted.  When the fused sign(a)*x was emitted,
it would explode.  The solution was to just bail on the optimization
when sign(a)*sign(b) is encountered.  I suspect that's the source of the
32 shaders with instructions hurt in the alternate version, but I have
not verified that.

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/11] intel/compiler: Optimize sign(x)*y

Reply via email to