On Mon, Aug 17, 2015 at 1:27 AM, Connor Abbott <cwabbo...@gmail.com> wrote: > On Mon, Aug 17, 2015 at 12:09 AM, Pohjolainen, Topi > <topi.pohjolai...@intel.com> wrote: >> On Fri, Aug 14, 2015 at 03:30:18PM -0700, Connor Abbott wrote: >>> In some cases, we need to emit ALU instructions with a certain stride >>> due to a HW limitation. When splitting that instruction, we need to >>> respect the original stride when creating the temporaries we load from >>> and store into. Otherwise, we'll reintroduce the problem we were trying >>> to work around. >>> >>> Signed-off-by: Connor Abbott <connor.w.abb...@intel.com> >>> --- >>> src/mesa/drivers/dri/i965/brw_fs.cpp | 14 ++++++++++---- >>> 1 file changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> index 812648f..386e9a2 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> @@ -2370,12 +2370,14 @@ fs_visitor::opt_register_renaming() >>> >>> if (depth == 0 && >>> inst->dst.file == GRF && >>> - alloc.sizes[inst->dst.reg] == inst->exec_size / 8 && >>> + alloc.sizes[inst->dst.reg] == >>> + inst->dst.component_size(inst->exec_size) && >>> !inst->is_partial_write()) { >>> if (remap[dst] == -1) { >>> remap[dst] = dst; >>> } else { >>> - remap[dst] = alloc.allocate(inst->exec_size / 8); >>> + remap[dst] = >>> + alloc.allocate(inst->dst.component_size(inst->exec_size)); >>> inst->dst.reg = remap[dst]; >>> progress = true; >>> } >>> @@ -4334,6 +4336,8 @@ fs_visitor::lower_simd_width() >>> * temporary passed as source to the lowered instruction. >>> */ >>> split_inst.src[j] = lbld.vgrf(inst->src[j].type, >>> src_size); >>> + split_inst.src[j].subreg_offset = >>> inst->src[j].subreg_offset; >> >> The fixes for the wider component size (64-bits) and the stride are clearly >> needed for doubles. I'm wondering though why the sub-register offset hasn't >> caused us any problems before. That change is not needed just for doubles, >> is it? > > I think we haven't noticed this before for two reasons: > > 1. It usually isn't needed for correctness. > 2. We don't really use subreg_offset that much anyways. > > The splitting code is moving the inputs into a bunch of temporary > registers, doing the split operation into a new bunch of temporaries, > and then moving those temporaries back into the destination register. > It's copying over the stride and subreg_offset fields from the > original inputs and outputs, so when all is said and done, it's > producing the same result and using the same sources. The difference > is that the split instructions themselves weren't copying over those > fields, meaning that they were always operating on packed registers > (stride of 1 and subreg_offset of 0). This is normally ok, except > where, due to a restriction, the split instruction *must* have the > same stride/offset as the original instruction. For fp64, DF->F > conversion must preserve the stride since we have to use a stride of 2 > for the output. The only thing I can think of that has to preserve the > subreg_offset is math instructions, since the source offset has to be > the same as the destination offset, but that should be a corner case > and I'm not sure whether we'll ever get in a situation where a math > instruction has a non-zero subreg_offset.
Actually, if we don't respect the subreg_offset, then both of the lowered instructions will have an offset of 0, which is totally fine... so I guess the subreg_offset part is solely about helping optimizations (and consistency). > > But although I don't think it's necessary for correctness, but I think > it should help with register coalescing. For example, take > pack/unpackDouble2x32. I've implemented it in terms of two strided > MOV's, which need to be split by this pass. Say we have something > like: > > uint hi = unpackDouble2x32(input).y; > > in SIMD16 mode, this turns into: > > mov(16) hi.1<2>:UD input:DF > > But this needs to be split, and after the splitting pass (without > fixing up the subreg_offsest like this patch does) and lowering away > all the LOAD_PAYLOAD's, it gets turned into something like: > > mov(8) temp1:DF input:DF 1Q > mov(8) temp2:DF input+1.0:DF 2Q > mov(8) temp3<2>:UD temp1:DF 1Q > mov(8) temp4<2>:UD temp2:DF 2Q > mov(8) hi+0.1<2>:UD temp3<2>:UD WE_all //The WE_all is due to a hack > in the SIMD lowering code > mov(8) hi+1.1<2>:UD temp4<2>:UD WE_all > > Copy prop should have no problem cleaning up the first two MOV's, but > the last two are more of an issue -- we currently don't have a pass > that can detect that you can offset all reads/writes of temp3 and > temp4 by 1 and coalesce temp3 and temp4 into hi and hi+1, and it would > be really hard to write something like that without SSA. But if we > pass through the offset while splitting it up, then things match up: > > mov(8) temp1:DF input:DF 1Q > mov(8) temp2:DF input+1.0:DF 2Q > mov(8) temp3+0.1<2>:UD temp1:DF 1Q > mov(8) temp4+0.1<2>:UD temp2:DF 2Q > mov(8) hi+0.1<2>:UD temp3+0.1<2>:UD WE_all > mov(8) hi+1.1<2>:UD temp4+0.1<2>:UD WE_all > > and we should be able to coalesce away all the extra MOV's. > > I should probably put some of this in the commit message... > > Also, I realized that this patch should really be split in two, since > the fixes to register renaming are independent of this -- whoops! > >> >>> + split_inst.src[j].stride = inst->src[j].stride; >>> emit_transpose(lbld.group(copy_width, 0), >>> split_inst.src[j], &src, 1, src_size, n); >>> } >>> @@ -4343,8 +4347,10 @@ fs_visitor::lower_simd_width() >>> /* Allocate enough space to hold the result of the lowered >>> * instruction and fix up the number of registers written. >>> */ >>> - split_inst.dst = dsts[i] = >>> - lbld.vgrf(inst->dst.type, dst_size); >>> + fs_reg dst = lbld.vgrf(inst->dst.type, dst_size); >>> + dst.stride = inst->dst.stride; >>> + dst.subreg_offset = inst->dst.subreg_offset; >>> + split_inst.dst = dsts[i] = dst; >>> split_inst.regs_written = >>> DIV_ROUND_UP(inst->regs_written * lower_width, >>> inst->exec_size); >>> -- >>> 2.4.3 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev