On Tue, 2017-01-10 at 14:31 -0800, Francisco Jerez wrote: > "Juan A. Suarez Romero" <jasua...@igalia.com> writes: > > > On Mon, 2017-01-09 at 15:41 -0800, Francisco Jerez wrote: > > > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > > > > > From: "Juan A. Suarez Romero" <jasua...@igalia.com> > > > > > > > > In IVB/VLV, for instructions dealing with DF, execsize will be > > > > duplicated in the final code. > > > > > > > > So take this in account when checking if instructions should be > > > > split. > > > > --- > > > > src/mesa/drivers/dri/i965/brw_fs.cpp | 13 ++++++++++++- > > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > index 78f2124..cfce364 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > @@ -4551,8 +4551,15 @@ static unsigned > > > > get_fpu_lowered_simd_width(const struct gen_device_info *devinfo, > > > > const fs_inst *inst) > > > > { > > > > + /* Note that in IVB/VLV for instructions that handles DF, we > > > > will duplicate > > > > + * the exec_size. So take this value for calculus purposes. > > > > + */ > > > > + unsigned exec_size = inst->exec_size; > > > > + if (devinfo->gen == 7 && !devinfo->is_haswell && inst- > > > > > exec_data_size() == 8) > > > > > > > > + exec_size *= 2; > > > > + > > > > > > NAK. This won't have any useful effect because the max_width value > > > doesn't actually have any influence in the regioning restriction > > > calculations below, it's only used to accumulate the intermediate > > > execution size limits derived from each hardware restriction. I > > > don't > > > think you need to change any of the restrictions below for the result > > > to > > > be correct in presence of exec size doubling. > > > > > > > > Yes, you're right. > > > > Actually the motivation for the patch was to avoid ending up with a > > final exec_size > 8 (after doubling it in the generator) when we are > > generating a SIMD8 shader (or exec_size > 16 in SIMD16 shader). > > > > So if we have a exec_size == 8, and we are generating SIMD8 code, then > > we want to split in two instructions of exec_size==4, because at the > > end we will double the exec_size to 8. Same reasoning with SIMD16 code. > > > > You're fine with such execution sizes as long asyou don't violate the 2 > GRFs per source or destination region hardware limit (and a few other > crazy restrictions) which are only affected by the *effective* execution > size of the instruction and don't care at all about the units the > execution size is expressed in at the machine code level (e.g. the total > amount of data accessed by a source region of the instruction is the > same regardless of whether you consider it to be a sequence of N > elements of size S or a sequence of 2*N elements of size S/2). >
OK, I get it. Then, let's drop this patch. Thanks for the feedback. > > > > This could be done as: > > > > @@ -5119,7 +5119,11 @@ fs_visitor::lower_simd_width() > > bool progress = false; > > > > foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { > > - const unsigned lower_width = get_lowered_simd_width(devinfo, > > inst); > > + unsigned lower_width = get_lowered_simd_width(devinfo, inst); > > + > > + if (devinfo->gen == 7 && !devinfo->is_haswell && > > + (get_exec_type_size(inst) == 8 || type_sz(inst->dst.type) == > > 8)) > > + lower_width = MIN2(lower_width, dispatch_width / 2); > > > > That wouldn't do any good... Let's just drop this patch. > OK, let's drop the entire patch. > > > > Now, this is all in theory. So far, no test has exposed this > > situation with this change of top of our branch, and the lower_width > > returned by get_lowered_simd_width() is always less than 8. But maybe > > this could happen in future, for instance, if we emit an instruction > > that writes more than 4 DF and has force_writemask_all == TRUE. > > > > > > > > > /* Maximum execution size representable in the instruction > > > > controls. */ > > > > - unsigned max_width = MIN2(32, inst->exec_size); > > > > + unsigned max_width = MIN2(32, exec_size); > > > > > > > > /* According to the PRMs: > > > > * "A. In Direct Addressing mode, a source cannot span more than 2 > > > > @@ -4656,6 +4663,10 @@ get_fpu_lowered_simd_width(const struct > > > > gen_device_info *devinfo, > > > > max_width = MIN2(max_width, channels_per_grf); > > > > } > > > > > > > > + /* If we have duplicated exec_size, then readjust max_width if > > > > required. */ > > > > + if (exec_size != inst->exec_size && max_width == exec_size) > > > > + max_width = inst->exec_size; > > > > + > > > > /* Only power-of-two execution sizes are representable in the > > > > instruction > > > > * control fields. > > > > */ > > > > -- > > > > 2.9.3 > > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev