On Fri, May 8, 2015 at 3:36 AM, Kenneth Graunke <kenn...@whitecape.org> wrote: > According to Glenn, shifts on R600 have 5x the throughput as multiplies. > > Intel GPUs have strange integer multiplication restrictions - on most > hardware, MUL actually only does a 32-bit x 16-bit multiply. This > means the arguments aren't commutative, which can limit our constant > propagation options. SHL has no such restrictions. > > Shifting is probably reasonable on most people's hardware, so let's just > do that. > > i965 shader-db results (using NIR for VS): > total instructions in shared programs: 7432587 -> 7388982 (-0.59%) > instructions in affected programs: 1360411 -> 1316806 (-3.21%) > helped: 5772 > HURT: 0 > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > Cc: matts...@gmail.com > Cc: ja...@jlekstrand.net > --- > src/glsl/nir/nir_opt_algebraic.py | 5 +++++ > 1 file changed, 5 insertions(+) > > So...I found a bizarre issue with this patch. > > (('imul', 4, a), ('ishl', a, 2)), > > totally optimizes things. However... > > (('imul', a, 4), ('ishl', a, 2)), > > doesn't seem to do anything, even though imul is commutative, and nir_search > should totally handle that... > > ▄▄ ▄▄ ▄▄ ▄▄▄▄▄▄▄▄ ▄▄▄▄▄ ▄▄ > ██ ██ ████ ▀▀▀██▀▀▀ █▀▀▀▀██ ██ > ▀█▄ ██ ▄█▀ ████ ██ ▄█▀ ██ > ██ ██ ██ ██ ██ ██ ▄██▀ ██ > ███▀▀███ ██████ ██ ██ ▀▀ > ███ ███ ▄██ ██▄ ██ ▄▄ ▄▄ > ▀▀▀ ▀▀▀ ▀▀ ▀▀ ▀▀ ▀▀ ▀▀ > > If you know why, let me know, otherwise I may have to look into it when more > awake.
I figured it out and I have a patch. Unfortunately, it regresses a few programs and looses 8 SIMD8 programs so I'm doing some more investigation. I'll send it out soon. > diff --git a/src/glsl/nir/nir_opt_algebraic.py > b/src/glsl/nir/nir_opt_algebraic.py > index 400d60e..350471f 100644 > --- a/src/glsl/nir/nir_opt_algebraic.py > +++ b/src/glsl/nir/nir_opt_algebraic.py > @@ -247,6 +247,11 @@ late_optimizations = [ > (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))), > (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), > (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))), > + > + # Multiplication by 4 comes up fairly often in indirect offset > calculations. > + # Some GPUs have weird integer multiplication limitations, but shifts > should work > + # equally well everywhere. > + (('imul', 4, a), ('ishl', a, 2)), > ] > > print nir_algebraic.AlgebraicPass("nir_opt_algebraic", > optimizations).render() > -- > 2.4.0 > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev