On Wednesday, June 15, 2016 9:25:44 AM PDT Samuel Iglesias Gonsálvez wrote: > From: Iago Toral Quiroga <ito...@igalia.com> > > From the Cherryview PRM, Volume 7, 3D Media GPGPU Engine, > Register Region Restrictions: > > "When source or destination is 64b (...), regioning in Align1 > must follow these rules: > > 1. Source and destination horizontal stride must be aligned to > the same qword. > (...)" > > Cc: "12.0" <mesa-sta...@lists.freedesktop.org> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462 > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index b811953..c271e64 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -715,10 +715,37 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, > nir_alu_instr *instr) > case nir_op_u2f: > if (optimize_extract_to_float(instr, result)) > return; > + inst = bld.MOV(result, op[0]); > + inst->saturate = instr->dest.saturate; > + break; > > case nir_op_f2d: > case nir_op_i2d: > case nir_op_u2d: > + /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region Restrictions: > + * > + * "When source or destination is 64b (...), regioning in Align1 > + * must follow these rules: > + * > + * 1. Source and destination horizontal stride must be aligned to > + * the same qword. > + * (...)" > + * > + * This means that 32-bit to 64-bit conversions need to have the 32-bit > + * data elements aligned to 64-bit. This restriction does not apply to > + * BDW and later. > + */ > + if (devinfo->is_cherryview) { > + fs_reg tmp = bld.vgrf(result.type, 1); > + tmp = subscript(tmp, op[0].type, 0); > + inst = bld.MOV(tmp, op[0]); > + inst->regs_written = > + inst->dst.component_size(bld.dispatch_width()) / REG_SIZE;
As we discussed on #intel-gfx...this line isn't necessary. fs_inst::init() initializes regs_written to: DIV_ROUND_UP(dst.component_size(exec_size), REG_SIZE); where exec_size is initialized to dispatch_width in this case. So the default calculation for component_size() works out to: MAX2(channels * 2 [stride], 1) * type_sz(D) = channels * 2 * 4 = 64 while your new one is: MAX2(channels * 1 [stride], 1) * type_sz(DF) = channels * 1 * 8 = 64 So they're equivalent. With that line removed, this patch is: Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > + inst = bld.MOV(result, tmp); > + inst->saturate = instr->dest.saturate; > + break; > + } > + /* fallthrough */ > case nir_op_d2f: > case nir_op_d2i: > case nir_op_d2u: >
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev