On 12/22/14, Ben Widawsky <benjamin.widaw...@intel.com> wrote: > From: Matt Turner <matts...@gmail.com> > > In order to do a full 32x32 integer multiply using the MUL/MACH macro, the > operation must be split into 2 SIMD8 operations. This is required even if > you > don't care about the high bits. My interpretation of the requirement is that > the > accumulator simply doesn't have enough bits to perform the 32x32 multiply. > > v2 by Ben: > Added real commit message > Squashed in some changes from Matt's simd16 branch > Change the way sechalf if used. > According to my reading of the docs, the sechalf operation should only > be > used for the mov instruction to obtain low bits. Any other usage is for > the > high results from the MUL/MACH macro
I was going to tell you that I strongly suspected that this was wrong and that we needed better tests to catch this case, but if the tests still pass after this commit, then we *definitely* need better tests because this has to be wrong since you're setting force_sechalf on both MOVs! :) > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86822 > Signed-off-by: Ben Widawsky <b...@bwidawsk.net> > --- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 28 > +++++++++++++++++++--------- > 1 file changed, 19 insertions(+), 9 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index 5b83c40..0d8cacb 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > @@ -774,15 +774,25 @@ fs_visitor::visit(ir_expression *ir) > else > emit(MUL(this->result, op[0], op[1])); > } else { > - if (brw->gen >= 7) > - no16("SIMD16 explicit accumulator operands unsupported\n"); > - > - struct brw_reg acc = retype(brw_acc_reg(dispatch_width), > - this->result.type); > - > - emit(MUL(acc, op[0], op[1])); > - emit(MACH(reg_null_d, op[0], op[1])); > - emit(MOV(this->result, fs_reg(acc))); > + unsigned width = brw->gen == 7 ? 8 : dispatch_width; > + fs_reg acc = fs_reg(retype(brw_acc_reg(width), > this->result.type)); > + fs_reg null = fs_reg(retype(brw_null_vec(width), > this->result.type)); > + > + if (brw->gen == 7 && dispatch_width == 16) { > + emit(MUL(acc, half(op[0], 0), half(op[1], 0))); > + emit(MACH(null, half(op[0], 0), half(op[1], 0))); > + fs_inst *mov = emit(MOV(half(this->result, 0), acc)); > + mov->force_sechalf = true; This would calculate the first 8 channels' results but then copy them to this->result using the second half's channel enables. Imagine that the first eight channels are enabled, but the last eight are disabled -- the MOV won't copy any data. > + > + emit(MUL(acc, half(op[0], 1), half(op[1], 1))); > + emit(MACH(null, half(op[0], 1), half(op[1], 1))); > + mov = emit(MOV(half(this->result, 1), acc)); > + mov->force_sechalf = true; And here, we're calculating the last eight channels using the first half's channel enable bits, so when the MOV executes it might be reading data from disabled channels. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev