The general idea is that with 32-bit swizzles we cannot address DF components Z/W directly, so instead we select the region that starts at the middle of the SIMD register and use X/Y swizzles.
The above, however, has the caveat that we can't do that without violating register region restrictions unless we probably do some sort of SIMD splitting. Alternatively, we can accomplish what we need without SIMD splitting by exploiting the gen7 hardware decompression bug for instructions with a vstride=0. For example, an instruction like this: mov(8) r2.x:DF r0.2<0>xyzw:DF Activates the hardware bug and produces this region: Component: x0 y0 z0 w0 x1 y1 z1 w1 Register: r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3 Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2 execution and r1.2 and r1.3 are the same for the second vertex. Using this to our advantage we can select r0.z:DF by doing r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing to split the instruction. This patch makes makes the swizzle translation pass handle Z/W swizzles by turning them into X/Y respectively and setting subnr to point at the middle of the register together with a flag that indicates that we want to use a vstride=0 with them. Then, when we convert to hardware registers we check fo this flag and set the vstride accordingly. Of course, this only works for gen7, but that is the only hardware platform were we implement align16/fp64a at the moment. v2: Fix subnr for FIXED_GRF (Samuel) Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 42 +++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index bfbbd96..ea1e530 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1861,11 +1861,32 @@ vec4_visitor::convert_to_hw_regs() unsigned width = REG_SIZE / 2 / MAX2(4, type_size); reg = brw_vecn_grf(width, src.nr + src.reg_offset, 0); reg.type = src.type; + reg.subnr = src.subnr * type_size; reg.swizzle = src.swizzle; reg.abs = src.abs; reg.negate = src.negate; if (type_size == 8) { - reg.vstride = BRW_VERTICAL_STRIDE_2; + if (src.force_vstride0) { + /* We use subnr to select components Z/W of DF operands using + * X/Y swizzles. To do this we also need to set the vertical + * stride to 0 so we don't violate register region + * restrictions. + * + * In gen7, setting the vertical stride to 0 on compressed + * instructions exploits a gen7 hardware hardware + * decompression bug that allows us to select the second half + * of a dvec4 for both vertices in a SIMD4x2 execution. + * + * FIXME: This only works for gen7. If we ever support + * align16/fp64 in other hardware where we can't exploit this + * bug we would also need to do appropriate SIMD splitting of + * these instructions. + */ + assert(devinfo->gen == 7); + reg.vstride = BRW_VERTICAL_STRIDE_0; + } else { + reg.vstride = BRW_VERTICAL_STRIDE_2; + } } break; } @@ -2171,7 +2192,26 @@ vec4_visitor::expand_64bit_swizzle_to_32bit() /* This pass assumes that we have scalarized all DF instructions */ assert(brw_is_single_value_swizzle(inst->src[arg].swizzle)); + /* To gain access to Z/W components we need to use subnr to select + * the second half of the DF regiter and then use a X/Y swizzle to + * select Z/W respetively. + */ unsigned swizzle = BRW_GET_SWZ(inst->src[arg].swizzle, 0); + if (swizzle >= 2) { + /* Uniforms work in units of a vec4, so to select the second + * half of a dvec3/4 uniform, increase reg_offset by one. + */ + if (inst->src[arg].file != UNIFORM) { + inst->src[arg].subnr = 2; + /* Subnr must be in units of bytes for FIXED_GRF */ + if (inst->src[arg].file == FIXED_GRF) + inst->src[arg].subnr *= type_sz(inst->src[arg].type); + inst->src[arg].force_vstride0 = true; + } else { + inst->src[arg].reg_offset += 1; + } + swizzle -= 2; + } inst->src[arg].swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1, swizzle * 2, swizzle * 2 + 1); progress = true; -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev