On Mon, 2017-01-09 at 15:41 -0800, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > From: "Juan A. Suarez Romero" <jasua...@igalia.com> > > > > In IVB/VLV, for instructions dealing with DF, execsize will be > > duplicated in the final code. > > > > So take this in account when checking if instructions should be > > split. > > --- > > src/mesa/drivers/dri/i965/brw_fs.cpp | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > index 78f2124..cfce364 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > @@ -4551,8 +4551,15 @@ static unsigned > > get_fpu_lowered_simd_width(const struct gen_device_info *devinfo, > > const fs_inst *inst) > > { > > + /* Note that in IVB/VLV for instructions that handles DF, we > > will duplicate > > + * the exec_size. So take this value for calculus purposes. > > + */ > > + unsigned exec_size = inst->exec_size; > > + if (devinfo->gen == 7 && !devinfo->is_haswell && inst- > > >exec_data_size() == 8) > > + exec_size *= 2; > > + > > NAK. This won't have any useful effect because the max_width value > doesn't actually have any influence in the regioning restriction > calculations below, it's only used to accumulate the intermediate > execution size limits derived from each hardware restriction. I > don't > think you need to change any of the restrictions below for the result > to > be correct in presence of exec size doubling.
Yes, you're right. Actually the motivation for the patch was to avoid ending up with a final exec_size > 8 (after doubling it in the generator) when we are generating a SIMD8 shader (or exec_size > 16 in SIMD16 shader). So if we have a exec_size == 8, and we are generating SIMD8 code, then we want to split in two instructions of exec_size==4, because at the end we will double the exec_size to 8. Same reasoning with SIMD16 code. This could be done as: @@ -5119,7 +5119,11 @@ fs_visitor::lower_simd_width() bool progress = false; foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { - const unsigned lower_width = get_lowered_simd_width(devinfo, inst); + unsigned lower_width = get_lowered_simd_width(devinfo, inst); + + if (devinfo->gen == 7 && !devinfo->is_haswell && + (get_exec_type_size(inst) == 8 || type_sz(inst->dst.type) == 8)) + lower_width = MIN2(lower_width, dispatch_width / 2); Now, this is all in theory. So far, no test has exposed this situation with this change of top of our branch, and the lower_width returned by get_lowered_simd_width() is always less than 8. But maybe this could happen in future, for instance, if we emit an instruction that writes more than 4 DF and has force_writemask_all == TRUE. > > > /* Maximum execution size representable in the instruction controls. */ > > - unsigned max_width = MIN2(32, inst->exec_size); > > + unsigned max_width = MIN2(32, exec_size); > > > > /* According to the PRMs: > > * "A. In Direct Addressing mode, a source cannot span more than 2 > > @@ -4656,6 +4663,10 @@ get_fpu_lowered_simd_width(const struct > > gen_device_info *devinfo, > > max_width = MIN2(max_width, channels_per_grf); > > } > > > > + /* If we have duplicated exec_size, then readjust max_width if > > required. */ > > + if (exec_size != inst->exec_size && max_width == exec_size) > > + max_width = inst->exec_size; > > + > > /* Only power-of-two execution sizes are representable in the > > instruction > > * control fields. > > */ > > -- > > 2.9.3 > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev