The brw_nir_opt_peephole_ffma pass is only doing what the fuse_ffma option already does. It produces the same result as the fuse_ffma option, which is not optimal.

This is what I get:
   vec4 32 ssa_7 = fmul ssa_6, ssa_1.yyyy
   vec4 32 ssa_8 = ffma ssa_5, ssa_1.xxxx, ssa_7
   vec4 32 ssa_10 = ffma ssa_9, ssa_1.zzzz, ssa_8
   vec4 32 ssa_12 = fadd ssa_10, ssa_11
But better optimized as (example with the least rearrangements):
   vec4 32 ssa_7 = ffma ssa_6, ssa_1.yyyy, ssa_11
   vec4 32 ssa_8 = ffma ssa_5, ssa_1.xxxx, ssa_7
   vec4 32 ssa_10 = ffma ssa_9, ssa_1.zzzz, ssa_8

Fusing the fmul and fadd in this case is not obvious. Could this patch be OK if it is behind the fuse_ffma option?

On 11/12/2018 02:30 PM, Jason Ekstrand wrote:
In general, you're not supposed to mess around with the precision of fma...
What we do in the Intel drivers is to leave fma split, apply operations,
and then we have a special mul+add fusion pass we run at the end.  Leaving
them split allows for exactly this kind of optimization without mixing up
those FMAs that are supposed to be kept fused and those generated by
mul+add fusion which can be split back apart and re-optimized.

On Mon, Nov 12, 2018 at 12:17 PM Jonathan Marek <jonat...@marek.ca> wrote:

This works by moving the fadd up across the ffma operations, so that it
can eventually can be combined with a fmul. I'm not sure it works in all
cases, but it works in all the common cases.

Example:
     matrix * vec4(coord, 1.0)
is compiled as:
     fmul, ffma, ffma, fadd
and with this patch:
     ffma, ffma, ffma

Signed-off-by: Jonathan Marek <jonat...@marek.ca>
---
  src/compiler/nir/nir_opt_algebraic.py | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index 8f4df891b8..82e10731a6 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -133,6 +133,7 @@ optimizations = [
     (('~fadd@64', a, ('fmul',         c , ('fadd', b, ('fneg', a)))),
('flrp', a, b, c), '!options->lower_flrp64'),
     (('ffma', a, b, c), ('fadd', ('fmul', a, b), c),
'options->lower_ffma'),
     (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c),
'options->fuse_ffma'),
+   (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))),

     (('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b,
c), d)),
     (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
--
2.17.1

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to