On Tue, 2017-01-10 at 22:07 -0800, Francisco Jerez wrote: > Hi Matt, > > Matt Turner <matts...@gmail.com> writes: > > > On Sun, Jan 8, 2017 at 10:53 PM, Matt Turner <matts...@gmail.com> > > wrote: > > > On 01/05, Samuel Iglesias Gonsálvez wrote: > > > > > > > > From: "Juan A. Suarez Romero" <jasua...@igalia.com> > > > > > > > > When dealing with DF uniforms with just 1 component, we set > > > > stride 0 to > > > > use the value along the operation. However, when duplicating > > > > the > > > > regioning parameters in IVB/VLV, we are violating the regioning > > > > restrictions. > > > > > > > > So instead of using the value with stride 0, we just duplicate > > > > it in a > > > > register, and then use the register instead, avoiding a DF with > > > > stride 0. > > > > --- > > > > src/mesa/drivers/dri/i965/brw_fs.cpp | 63 > > > > ++++++++++++++++++++++++++++++++++++ > > > > src/mesa/drivers/dri/i965/brw_fs.h | 1 + > > > > 2 files changed, 64 insertions(+) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > index eb3b4aa..78f2124 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > @@ -2168,6 +2168,62 @@ fs_visitor::lower_constant_loads() > > > > invalidate_live_intervals(); > > > > } > > > > > > > > +/** > > > > + * When dealing with double-precision floats (DF) in IVB/VLV, > > > > we need to > > > > + * duplicate the regioning parameters. This means that for a > > > > DF scalar > > > > + * (regioning <0,1,0>) we will end up using regioning <0,2,0>. > > > > But > > > > according > > > > + * to General Restrictions on Regioning Parameters (Ivy PRM, > > > > Vol. 4 Part > > > > 3, > > > > + * page 69), if VertStride = HorzStride = 0, Width must be 1 > > > > regardless > > > > of the > > > > + * value of ExecSize. So we would be violating the > > > > restriction. To > > > > overcome > > > > + * it, this lowering step duplicates the scalar in a couple of > > > > registers, > > > > + * reading it as two floats to avoid the restriction. > > > > > > > > > Huh, I would have thought that a <0,1,0>DF region would have done > > > what > > > we wanted, without the need to double any of the region > > > parameters. > > > > > > I haven't tested yet, so I'll play with it tomorrow and see if it > > > blows > > > up. > > > > Indeed, it looks like gX<0,2,1>DF works for scalar sources. > > > > The following patch seems to work for me in place of 04/22. > > > > I would expect if we had an instruction like > > > > add(16) gX<1>DF gY<8,8,1>DF gZ<0,2,1>DF > > > > that gX would write 2 registers and gY would read two registers, > > but > > gZ would read only one register -- and that would violate the > > restriction that an instruction that writes two registers must also > > read two registers. > > > > But it seems that we never generate a DF instruction with > > exec_size=16 > > because of > > > > 4554 /* Note that in IVB/VLV for instructions that handles > > DF, > > we will duplicate > > 4555 * the exec_size. So take this value for calculus > > purposes. > > 4556 */ > > 4557 unsigned exec_size = inst->exec_size; > > 4558 if (devinfo->gen == 7 && !devinfo->is_haswell && > > inst->exec_data_size() == 8) > > 4559 exec_size *= 2; > > > > That's not the reason they aren't generated. The hunk above had the > opposite of the effect they intended, and didn't double the execution > size value used by any regioning restriction calculation, it only > raised > the upper bound that the resulting lowered execution size is clamped > to. > That's about half of the reason I NAK'ed the patch. > > The reason they aren't generating any 16-wide double float > instructions > is that they would violate other regioning restrictions implemented > in > get_fpu_lowered_simd_width(), in particular the one that forces > non-force_writemask_all double float instructions to SIMD4 due to the > Gen7 exec mask decompression bug. > > > ... which I think Curro NAK'd elsewhere. > > > > I think what we want is to use 0,2,1 regions for scalar sources, > > and > > split exec_size=16 (i.e., 8 DFs) instructions when they use scalar > > sources in order to avoid violating the restriction. Otherwise, we > > should emit exec_size=16 DF instructions where possible. > > Agreed -- And thanks for testing it out on real hardware. We'll > definitely need a new restriction check in > get_fpu_lowered_simd_width() > that clamps to 4 the execution size of any instruction with a DF > scalar > source on pre-HSW hardware for the patch below to be fully correct. > That way the SIMD lowering pass will get rid of any compressed > instructions with scalar source (whether force_writemask_all or not) > and > we avoid tickling the Gen7 region decompression bug. >
OK, I'll apply Matt's patch and this change too. Thanks Matt and Curro for the patch/suggestions! Sam > > From 67bab74148e110171b8f3a999dc1d79616013108 Mon Sep 17 00:00:00 > > 2001 > > From: Matt Turner <matts...@gmail.com> > > Date: Tue, 10 Jan 2017 19:33:22 -0800 > > Subject: [PATCH] i965: Use <0,2,1> region for scalar DF sources on > > IVB/BYT. > > > > On HSW+, scalar DF sources can be accessed using the normal <0,1,0> > > region, but on IVB and BYT DF regions must be programmed in terms > > of > > floats. A <0,2,1> region accomplishes this. > > --- > > src/mesa/drivers/dri/i965/brw_eu_emit.c | 26 ++++++++++++++++++++- > > ----- > > 1 file changed, 20 insertions(+), 6 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > index 2843385..5f81b7a 100644 > > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > @@ -495,9 +495,16 @@ brw_set_src0(struct brw_codegen *p, brw_inst > > *inst, struct brw_reg reg) > > brw_inst_set_src0_width(devinfo, inst, BRW_WIDTH_1); > > brw_inst_set_src0_vstride(devinfo, inst, > > BRW_VERTICAL_STRIDE_0); > > } else { > > - brw_inst_set_src0_hstride(devinfo, inst, reg.hstride); > > - brw_inst_set_src0_width(devinfo, inst, reg.width); > > - brw_inst_set_src0_vstride(devinfo, inst, reg.vstride); > > + if (devinfo->gen == 7 && !devinfo->is_haswell && > > + reg.type == BRW_REGISTER_TYPE_DF && > > has_scalar_region(reg)) { > > + brw_inst_set_src0_vstride(devinfo, inst, > > BRW_VERTICAL_STRIDE_0); > > + brw_inst_set_src0_width(devinfo, inst, > > BRW_WIDTH_2); > > + brw_inst_set_src0_hstride(devinfo, inst, > > BRW_HORIZONTAL_STRIDE_1); > > + } else { > > + brw_inst_set_src0_vstride(devinfo, inst, > > reg.vstride); > > + brw_inst_set_src0_width(devinfo, inst, reg.width); > > + brw_inst_set_src0_hstride(devinfo, inst, > > reg.hstride); > > + } > > } > > } else { > > brw_inst_set_src0_da16_swiz_x(devinfo, inst, > > @@ -577,9 +584,16 @@ brw_set_src1(struct brw_codegen *p, brw_inst > > *inst, struct brw_reg reg) > > brw_inst_set_src1_width(devinfo, inst, BRW_WIDTH_1); > > brw_inst_set_src1_vstride(devinfo, inst, > > BRW_VERTICAL_STRIDE_0); > > } else { > > - brw_inst_set_src1_hstride(devinfo, inst, reg.hstride); > > - brw_inst_set_src1_width(devinfo, inst, reg.width); > > - brw_inst_set_src1_vstride(devinfo, inst, reg.vstride); > > + if (devinfo->gen == 7 && !devinfo->is_haswell && > > + reg.type == BRW_REGISTER_TYPE_DF && > > has_scalar_region(reg)) { > > + brw_inst_set_src1_vstride(devinfo, inst, > > BRW_VERTICAL_STRIDE_0); > > + brw_inst_set_src1_width(devinfo, inst, > > BRW_WIDTH_2); > > + brw_inst_set_src1_hstride(devinfo, inst, > > BRW_HORIZONTAL_STRIDE_1); > > + } else { > > + brw_inst_set_src1_vstride(devinfo, inst, > > reg.vstride); > > + brw_inst_set_src1_width(devinfo, inst, reg.width); > > + brw_inst_set_src1_hstride(devinfo, inst, > > reg.hstride); > > + } > > } > > } else { > > brw_inst_set_src1_da16_swiz_x(devinfo, inst, > > -- > > 2.7.3
signature.asc
Description: This is a digitally signed message part
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev