On 2/13/19 1:29 PM, Eduardo Lima Mitev wrote: > ir3 compiler has an integer multiply-add instruction (MAD_S24) > that is used for different offset calculations in the backend. > Since we intend to move some of these calculations to NIR, we need > a new ALU op that can directly represent it. > --- > src/compiler/nir/nir_opcodes.py | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py > index d32005846a6..abbb3627a33 100644 > --- a/src/compiler/nir/nir_opcodes.py > +++ b/src/compiler/nir/nir_opcodes.py > @@ -892,3 +892,19 @@ dst.w = src3.x; > """) > > > +# Freedreno-specific opcode that maps directly to ir3_MAD_S24. > +# It is emitted by ir3_nir_lower_io_offsets pass when computing > +# byte-offsets for image store and atomics. > +# > +# The nir_algebraic expression below is: get 23 bits of the > +# two factors as unsigned and multiply them. If either of the > +# two was negative, invert sign of the product. Then add it src2. > +# @FIXME: I suspect there is a simpler expression for this. > +triop("imad24_ir3", tint, """ > +unsigned f0 = ((unsigned) src0) & 0x7fffff; > +unsigned f1 = ((unsigned) src1) & 0x7fffff; > +dst = f0 * f1;
How about (((int)src0 << 8) >> 8) * (((int)src1 << 8) >> 8) + src2? The trick is making sure the implementation matches what the hardware does in all cases. My expression will produce different results than yours for cases like 0xf01fffff * 2. 0x3ffffe vs -0x3ffffe. "Correct" depends entirely on what real hardware would produce. If I had to guess, I would guess that the hardware would produce 0x3ffffe since it likely just ignores the upper 8 bits of the sources. > +if (src0 * src1 < 0) > + dst = -dst; > +dst += src2; > +""") > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev