On Sat, 2019-02-16 at 09:02 -0600, Jason Ekstrand wrote: > On Tue, Feb 12, 2019 at 5:56 AM Iago Toral Quiroga <ito...@igalia.com > > wrote: > > Empirical testing shows that gen8 has a bug where MAD instructions > > with > > > > a half-float source starting at a non-zero offset fail to execute > > > > properly. > > > > > > > > This scenario usually happened in SIMD8 executions, where we used > > to > > > > pack vector components Y and W in the second half of SIMD registers > > > > (therefore, with a 16B offset). It looks like we are not currently > > doing > > > > this any more but this would handle the situation properly if we > > ever > > > > happen to produce code like this again. > > > > > > > > v2 (Jason): > > > > - Move this workaround to the lower_regioning pass as an > > additional case > > > > to has_invalid_src_region() > > > > - Do not apply the workaround if the stride of the source operand > > is 0, > > > > testing suggests the problem doesn't exist in that case. > > > > > > > > Reviewed-by: Topi Pohjolainen <topi.pohjolai...@intel.com> (v1) > > > > --- > > > > src/intel/compiler/brw_fs_lower_regioning.cpp | 39 +++++++++++++ > > ------ > > > > 1 file changed, 28 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp > > b/src/intel/compiler/brw_fs_lower_regioning.cpp > > > > index df50993dee6..7c70cfab535 100644 > > > > --- a/src/intel/compiler/brw_fs_lower_regioning.cpp > > > > +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp > > > > @@ -109,20 +109,37 @@ namespace { > > > > has_invalid_src_region(const gen_device_info *devinfo, const > > fs_inst *inst, > > > > unsigned i) > > > > { > > > > - if (is_unordered(inst)) { > > > > + if (is_unordered(inst)) > > > > return false; > > > > - } else { > > > > - const unsigned dst_byte_stride = inst->dst.stride * > > type_sz(inst->dst.type); > > > > - const unsigned src_byte_stride = inst->src[i].stride * > > > > - type_sz(inst->src[i].type); > > > > - const unsigned dst_byte_offset = reg_offset(inst->dst) % > > REG_SIZE; > > > > - const unsigned src_byte_offset = reg_offset(inst->src[i]) > > % REG_SIZE; > > > > > > > > - return has_dst_aligned_region_restriction(devinfo, inst) > > && > > > > - !is_uniform(inst->src[i]) && > > > > - (src_byte_stride != dst_byte_stride || > > > > - src_byte_offset != dst_byte_offset); > > > > + /* Empirical testing shows that Broadwell has a bug > > affecting half-float > > > > + * MAD instructions when any of its sources has a non-zero > > offset, such > > > > + * as: > > > > + * > > > > + * mad(8) g18<1>HF -g17<4,4,1>HF g14.8<4,4,1>HF g11<4,4,1>HF > > { align16 1Q }; > > > > + * > > > > + * We used to generate code like this for SIMD8 executions > > where we > > > > + * used to pack components Y and W of a vector at offset 16B > > of a SIMD > > > > + * register. The problem doesn't occur if the stride of the > > source is 0. > > > > + */ > > > > + if (devinfo->gen == 8 && > > > > + inst->opcode == BRW_OPCODE_MAD && > > > > + inst->src[i].type == BRW_REGISTER_TYPE_HF && > > > > + inst->src[i].offset > 0 && > > > > + inst->src[i].stride != 0) { > > The above assumes the register is a GRF. Perhaps we should make this > assumption explicit? Or you can use some of curro's helpers and add > another one to get the subreg offset. Also, the real problem here > isn't offset > 0, it's offset % REG_SIZE > 0. If we have an array of > 4 things, they'll be at offsets 0, 16, 32, and 48. We don't want an > offset of 32 triggering it. >
You're right. We already have a helper available that does what we want, reg_offset() in brw_ir_fs.h. I have this now: if (devinfo->gen == 8 && inst->opcode == BRW_OPCODE_MAD && inst->src[i].type == BRW_REGISTER_TYPE_HF && reg_offset(inst->src[i]) % REG_SIZE > 0 && inst- >src[i].stride != 0) { return true; } > > + return true; > > > > } > > > > + > > > > + const unsigned dst_byte_stride = inst->dst.stride * > > type_sz(inst->dst.type); > > > > + const unsigned src_byte_stride = inst->src[i].stride * > > > > + type_sz(inst->src[i].type); > > > > + const unsigned dst_byte_offset = reg_offset(inst->dst) % > > REG_SIZE; > > > > + const unsigned src_byte_offset = reg_offset(inst->src[i]) % > > REG_SIZE; > > > > + > > > > + return has_dst_aligned_region_restriction(devinfo, inst) && > > > > + !is_uniform(inst->src[i]) && > > > > + (src_byte_stride != dst_byte_stride || > > > > + src_byte_offset != dst_byte_offset); > > > > } > > > > > > > > /* > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev