On Tue, Oct 9, 2018 at 3:30 PM Ian Romanick <i...@freedesktop.org> wrote:
> On 10/09/2018 10:17 AM, Jason Ekstrand wrote: > > On Tue, Oct 9, 2018 at 11:32 AM Ian Romanick <i...@freedesktop.org > > <mailto:i...@freedesktop.org>> wrote: > > > > On 10/08/2018 02:14 PM, Jason Ekstrand wrote: > > > On Mon, Oct 8, 2018 at 3:46 PM Ian Romanick <i...@freedesktop.org > > <mailto:i...@freedesktop.org> > > > <mailto:i...@freedesktop.org <mailto:i...@freedesktop.org>>> wrote: > > > > > > On 10/05/2018 09:10 PM, Jason Ekstrand wrote: > > > > --- > > > > src/compiler/nir/nir_constant_expressions.py | 1 + > > > > src/compiler/nir/nir_opcodes.py | 43 > > > ++++++++++++++++++-- > > > > 2 files changed, 40 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/src/compiler/nir/nir_constant_expressions.py > > > b/src/compiler/nir/nir_constant_expressions.py > > > > index 118af9f7818..afc0739e8b2 100644 > > > > --- a/src/compiler/nir/nir_constant_expressions.py > > > > +++ b/src/compiler/nir/nir_constant_expressions.py > > > > @@ -79,6 +79,7 @@ template = """\ > > > > #include <math.h> > > > > #include "util/rounding.h" /* for _mesa_roundeven */ > > > > #include "util/half_float.h" > > > > +#include "util/bigmath.h" > > > > #include "nir_constant_expressions.h" > > > > > > > > /** > > > > diff --git a/src/compiler/nir/nir_opcodes.py > > > b/src/compiler/nir/nir_opcodes.py > > > > index 4ef4ecc6f22..209f0c5509b 100644 > > > > --- a/src/compiler/nir/nir_opcodes.py > > > > +++ b/src/compiler/nir/nir_opcodes.py > > > > @@ -443,12 +443,47 @@ binop("isub", tint, "", "src0 - src1") > > > > binop("fmul", tfloat, commutative + associative, "src0 * > src1") > > > > # low 32-bits of signed/unsigned integer multiply > > > > binop("imul", tint, commutative + associative, "src0 * > src1") > > > > + > > > > # high 32-bits of signed integer multiply > > > > -binop("imul_high", tint32, commutative, > > > > - "(int32_t)(((int64_t) src0 * (int64_t) src1) >> 32)") > > > > +binop("imul_high", tint, commutative, """ > > > > > > This will enable imul_high for all integer types (ditto for > > umul_high > > > below). A later patch adds lowering for 64-bit integer type. > > Will the > > > backend do the right thing for [iu]mul_high of 16- or 8-bit > types? > > > > > > > > > That's a good question. Looks like lower_integer_multiplication > > in the > > > back-end will do nothing whatsoever, and we'll emit an illegal > opcode > > > which will probably hang the GPU. For 8 and 16, it's easy enough > to > > > lower to a couple of conversions, a N*2-bit multiply, and a > > shift. It's > > > also not obvious where the cut-off point for the optimization is. > > > Certainly, it's better in 64-bits than doing the division > algorithm in > > > the shader and I think it's better for 32 but maybe not in 8 and > 16? > > > I'm not sure. I'm pretty sure my 32-bit benchmark gave positive > > results > > > (about 40-50% faster) but it was very noisy. > > > > > > I don't think anything allows 8 and 16-bit arithmetic right now. > > Still, > > > should probably fix it... > > > > Hm... if an extension adds GL or Vulkan support for 16-bit > arithmetic, I > > doubt it would add [iu]mul_high (e.g., GL_AMD_gpu_shader_int16). I'd > > expect every GPU would support a 16x16=32 multiplier. Would it be > > better to restrict this instruction to 32- and 64-bit and implement > the > > integer division optimization differently for 8- and 16-bit sources? > > > > > > We don't currently have a mechanism in NIR to restrict an instruction to > > a particular set of bit-widths. Maybe we should consider adding that? > > Oh. I thought it could already do that. There must be some subtlety of > tint vs. tint32 in this context that I don't understand. Can you > restrict it to one size only vs. any size? > Correct. tint32 means 32-bit integer and tint means any integer. We don't have a concept of 32-or-64-bit-integer. Could be added but meh. We're already making assumptions in various back-ends that we don't receive certain instructions in certain sizes. > > As far as the optimization goes, I see a few different options; > > > > 1) Add a few lines in opt_algebraic to do the lowering. We could even > > make it unconditional for now. > > 2) Add a couple tiny helpers that switch on bit-width in opt_idiv_const > > and do the lowering on-the-fly. > > 3) Just disable the pass for 8 and 16-bit for now. It's questionable > > whether it's actually worth emitting 4-6 instructions for a simple > > division at those bit sizes anyway. > > > > I think my order of preference is 3 > 1 > 2. Especially when you > > consider that we have no good way to test the pass for anything other > > than 32 and 64-bit right now. > > Yeah... I didn't intend to open a can of worms. :) Option 3 sounds fine > to me. > Sounds good to me. I'll adjust the last patch to just not enable it for 8 and 16-bit. Thanks for catching this! --Jason
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev