On 12/22/14, Ben Widawsky <benjamin.widaw...@intel.com> wrote: > This patch uses the new QWORD type introduced on Gen8. This allows us to > perform > the operation without requiring the additional MACH. > > Similar to Gen7, it seems we must demote SIMD16 to 2 SIMD8s. On the bright > side, > we get the results in 3 instructions, and no MACH. MACH is undesirable > because > it requires the accumulator write flag, which can hinder optimization > passes. > > Signed-off-by: Ben Widawsky <b...@bwidawsk.net> > --- > src/mesa/drivers/dri/i965/brw_defines.h | 2 + > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 12 +++++ > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 73 > ++++++++++++++------------ > src/mesa/drivers/dri/i965/brw_shader.cpp | 2 + > 4 files changed, 56 insertions(+), 33 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 28e398d..102ba4a 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -1100,6 +1100,8 @@ enum opcode { > * and number of SO primitives needed. > */ > GS_OPCODE_FF_SYNC_SET_PRIMITIVES, > + > + SHADER_OPCODE_MOV64, > }; > > enum brw_urb_write_flags { > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > index c652d65..3a15837 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > @@ -1619,6 +1619,18 @@ fs_generator::generate_code(const cfg_t *cfg, int > dispatch_width) > case BRW_OPCODE_MUL: > brw_MUL(p, dst, src[0], src[1]); > break; > + case SHADER_OPCODE_MOV64: > + /* This opcode is used to mov the result of a native SIMD16 (2 > SIMD8s) > + * mul into the dst register. > + */ > + assert(brw->gen >= 8); > + src[0].subnr = 4; > + src[0].type = dst.type; > + src[0] = stride(src[0], 8, 4, 2); > + assert(dst.type == BRW_REGISTER_TYPE_UD || > + dst.type == BRW_REGISTER_TYPE_D); > + brw_MOV(p, dst, src[0]); > + break; > case BRW_OPCODE_AVG: > brw_AVG(p, dst, src[0], src[1]); > break; > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index de03618..98f1b0d 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > @@ -798,46 +798,53 @@ fs_visitor::visit(ir_expression *ir) > emit(MUL(this->result, op[0], op[1])); > } > break; > - case ir_binop_imul_high: { > - if (brw->gen >= 8) > - no16("SIMD16 explicit accumulator operands unsupported\n"); > + case ir_binop_imul_high: > + if (brw->gen >= 8) { > + /* Gen8 is able to do the full 32x32 multiply into a QWORD. > + * The docs say you cannot use direct addressing for a destination > of > + * more than 2 registers, which is the case in SIMD16. Therefore, > like > + * Gen7, the operation must be downgraded to two SIMD8 muls. > + * > + * Oddly, the results can be gathered with 1 mov operation, even > though > + * the docs suggest that shouldn't work. > + */
Does the simulator not complain about this? I think it definitely should. The docs are clear that you can't read from >2 consecutive registers. I suspect that again the piglit tests are insufficient. I think all of the piglit tests are calculating the same result in each channel. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev