On Sun, Jan 8, 2017 at 10:53 PM, Matt Turner <matts...@gmail.com> wrote: > On 01/05, Samuel Iglesias Gonsálvez wrote: >> >> From: "Juan A. Suarez Romero" <jasua...@igalia.com> >> >> When dealing with DF uniforms with just 1 component, we set stride 0 to >> use the value along the operation. However, when duplicating the >> regioning parameters in IVB/VLV, we are violating the regioning >> restrictions. >> >> So instead of using the value with stride 0, we just duplicate it in a >> register, and then use the register instead, avoiding a DF with stride 0. >> --- >> src/mesa/drivers/dri/i965/brw_fs.cpp | 63 >> ++++++++++++++++++++++++++++++++++++ >> src/mesa/drivers/dri/i965/brw_fs.h | 1 + >> 2 files changed, 64 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index eb3b4aa..78f2124 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -2168,6 +2168,62 @@ fs_visitor::lower_constant_loads() >> invalidate_live_intervals(); >> } >> >> +/** >> + * When dealing with double-precision floats (DF) in IVB/VLV, we need to >> + * duplicate the regioning parameters. This means that for a DF scalar >> + * (regioning <0,1,0>) we will end up using regioning <0,2,0>. But >> according >> + * to General Restrictions on Regioning Parameters (Ivy PRM, Vol. 4 Part >> 3, >> + * page 69), if VertStride = HorzStride = 0, Width must be 1 regardless >> of the >> + * value of ExecSize. So we would be violating the restriction. To >> overcome >> + * it, this lowering step duplicates the scalar in a couple of registers, >> + * reading it as two floats to avoid the restriction. > > > Huh, I would have thought that a <0,1,0>DF region would have done what > we wanted, without the need to double any of the region parameters. > > I haven't tested yet, so I'll play with it tomorrow and see if it blows > up.
Indeed, it looks like gX<0,2,1>DF works for scalar sources. The following patch seems to work for me in place of 04/22. I would expect if we had an instruction like add(16) gX<1>DF gY<8,8,1>DF gZ<0,2,1>DF that gX would write 2 registers and gY would read two registers, but gZ would read only one register -- and that would violate the restriction that an instruction that writes two registers must also read two registers. But it seems that we never generate a DF instruction with exec_size=16 because of 4554 /* Note that in IVB/VLV for instructions that handles DF, we will duplicate 4555 * the exec_size. So take this value for calculus purposes. 4556 */ 4557 unsigned exec_size = inst->exec_size; 4558 if (devinfo->gen == 7 && !devinfo->is_haswell && inst->exec_data_size() == 8) 4559 exec_size *= 2; ... which I think Curro NAK'd elsewhere. I think what we want is to use 0,2,1 regions for scalar sources, and split exec_size=16 (i.e., 8 DFs) instructions when they use scalar sources in order to avoid violating the restriction. Otherwise, we should emit exec_size=16 DF instructions where possible.
From 67bab74148e110171b8f3a999dc1d79616013108 Mon Sep 17 00:00:00 2001 From: Matt Turner <matts...@gmail.com> Date: Tue, 10 Jan 2017 19:33:22 -0800 Subject: [PATCH] i965: Use <0,2,1> region for scalar DF sources on IVB/BYT. On HSW+, scalar DF sources can be accessed using the normal <0,1,0> region, but on IVB and BYT DF regions must be programmed in terms of floats. A <0,2,1> region accomplishes this. --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 2843385..5f81b7a 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -495,9 +495,16 @@ brw_set_src0(struct brw_codegen *p, brw_inst *inst, struct brw_reg reg) brw_inst_set_src0_width(devinfo, inst, BRW_WIDTH_1); brw_inst_set_src0_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0); } else { - brw_inst_set_src0_hstride(devinfo, inst, reg.hstride); - brw_inst_set_src0_width(devinfo, inst, reg.width); - brw_inst_set_src0_vstride(devinfo, inst, reg.vstride); + if (devinfo->gen == 7 && !devinfo->is_haswell && + reg.type == BRW_REGISTER_TYPE_DF && has_scalar_region(reg)) { + brw_inst_set_src0_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0); + brw_inst_set_src0_width(devinfo, inst, BRW_WIDTH_2); + brw_inst_set_src0_hstride(devinfo, inst, BRW_HORIZONTAL_STRIDE_1); + } else { + brw_inst_set_src0_vstride(devinfo, inst, reg.vstride); + brw_inst_set_src0_width(devinfo, inst, reg.width); + brw_inst_set_src0_hstride(devinfo, inst, reg.hstride); + } } } else { brw_inst_set_src0_da16_swiz_x(devinfo, inst, @@ -577,9 +584,16 @@ brw_set_src1(struct brw_codegen *p, brw_inst *inst, struct brw_reg reg) brw_inst_set_src1_width(devinfo, inst, BRW_WIDTH_1); brw_inst_set_src1_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0); } else { - brw_inst_set_src1_hstride(devinfo, inst, reg.hstride); - brw_inst_set_src1_width(devinfo, inst, reg.width); - brw_inst_set_src1_vstride(devinfo, inst, reg.vstride); + if (devinfo->gen == 7 && !devinfo->is_haswell && + reg.type == BRW_REGISTER_TYPE_DF && has_scalar_region(reg)) { + brw_inst_set_src1_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0); + brw_inst_set_src1_width(devinfo, inst, BRW_WIDTH_2); + brw_inst_set_src1_hstride(devinfo, inst, BRW_HORIZONTAL_STRIDE_1); + } else { + brw_inst_set_src1_vstride(devinfo, inst, reg.vstride); + brw_inst_set_src1_width(devinfo, inst, reg.width); + brw_inst_set_src1_hstride(devinfo, inst, reg.hstride); + } } } else { brw_inst_set_src1_da16_swiz_x(devinfo, inst, -- 2.7.3
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev