Connor Abbott <cwabbo...@gmail.com> writes: > On Tue, May 3, 2016 at 6:16 PM, Francisco Jerez <curroje...@riseup.net> wrote: >> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >> >>> From: Connor Abbott <connor.w.abb...@intel.com> >>> >>> Work based on registers read/written instead of dispatch_width, so that >>> the interferences are added for 64-bit sources/destinations as well. >>> --- >>> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 >>> ++++++++++++++++------- >>> 1 file changed, 48 insertions(+), 19 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >>> index 2347cd5..f0e96f9 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >>> @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct >>> ra_graph *g, >>> } >>> } >>> >>> +static unsigned >>> +get_reg_node(const fs_reg& reg, unsigned first_payload_node) >>> +{ >>> + switch (reg.file) { >>> + case VGRF: >>> + return reg.nr; >>> + case ARF: >>> + case FIXED_GRF: >>> + return first_payload_node + reg.nr; >>> + case MRF: >>> + default: >>> + unreachable("unhandled register file"); >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static void >>> +add_reg_interference(struct ra_graph *g, const fs_reg& reg1, >>> + const fs_reg& reg2, unsigned first_payload_node) >>> +{ >>> + if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) && >>> + (reg2.file == VGRF || reg2.file == ARF || reg2.file == FIXED_GRF)) { >>> + ra_add_node_interference(g, get_reg_node(reg1, first_payload_node), >>> + get_reg_node(reg2, first_payload_node)); >>> + } >>> +} >>> + >>> bool >>> fs_visitor::assign_regs(bool allow_spilling) >>> { >>> @@ -643,26 +671,27 @@ fs_visitor::assign_regs(bool allow_spilling) >>> } >>> } >>> >>> - if (dispatch_width > 8) { >>> - /* In 16-wide dispatch we have an issue where a compressed >>> - * instruction is actually two instructions executed simultaneiously. >>> - * It's actually ok to have the source and destination registers be >>> - * the same. In this case, each instruction over-writes its own >>> - * source and there's no problem. The real problem here is if the >>> - * source and destination registers are off by one. Then you can end >>> - * up in a scenario where the first instruction over-writes the >>> - * source of the second instruction. Since the compiler doesn't know >>> - * about this level of granularity, we simply make the source and >>> - * destination interfere. >>> - */ >>> - foreach_block_and_inst(block, fs_inst, inst, cfg) { >>> - if (inst->dst.file != VGRF) >>> - continue; >>> + /* When instructions both read/write more than a single SIMD8 register, >>> we >>> + * have an issue where an instruction is actually two instructions >>> executed >>> + * simultaneiously. It's actually ok to have the source and destination >>> + * registers be the same. In this case, each instruction over-writes >>> its >>> + * own source and there's no problem. The real problem here is if the >>> + * source and destination registers are off by one. Then you can end >>> up in >>> + * a scenario where the first instruction over-writes the source of the >>> + * second instruction. Since the compiler doesn't know about this >>> level of >>> + * granularity, we simply make the source and destination interfere. >>> + */ >>> + foreach_block_and_inst(block, fs_inst, inst, cfg) { >>> + if (inst->dst.file != VGRF && >>> + inst->dst.file != ARF && inst->dst.file != FIXED_GRF) >>> + continue; >>> >>> - for (int i = 0; i < inst->sources; ++i) { >>> - if (inst->src[i].file == VGRF) { >>> - ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr); >>> - } >>> + if (inst->regs_written <= 1) >>> + continue; >>> + >>> + for (int i = 0; i < inst->sources; ++i) { >>> + if (inst->regs_read(i) > 1) { >> >> What's the point of checking that regs_read(i) > 1? Isn't this >> supposedly a problem in particular when a source register is fully >> contained inside the first (one register) half of the destination? > > No, it's not. It's a problem when the first half of the destination > overlaps with the second register read by source. For example, > something like: > > add(8) g11:DF g10:DF g20:DF > > will be broken into: > > add(4) g11:DF g10:DF g20:DF > add(4) g12:DF g11:DF g21:DF > > which obviously doesn't do what you want. Note that something with a > source stride of 2 and a destination stride of 1, like: > > mov(8) g11:UD g10<2>:UD > > has the same problem, so checking that regs_read(i) > 1 is both a > necessary and sufficient condition for the problem to hold for some > choice of physical registers. > It doesn't look like you're answering the question? :P Let me rephrase it slightly: Under the assumption that this is indeed a problem for regs_read(i) > 1, why would it *not* be a problem for regs_read(i) == 1 if there still is overlap between the destination of the second compressed half and the single-register source?
>> >> Either way I have some evidence suggesting that the EU works around this >> problem in hardware on at least some generations by buffering the result >> From the first half of any compressed instruction while the second half >> executes -- Can you come up with a test case where this fixes a problem? >> On what hardware? > > I'd be very surprised if that were the case, since to the part of the > pipeline that handles register fetching and writeback, an instruction > like > > add(8) g11:DF g10:DF g20:DF > > isn't really that different from > > add(16) g11:F g10:F g20:F > Oh? > and both the existing workaround in this patch and a separate > workaround in liveness analysis for uniforms already exist to prevent > something like the second instruction from getting generated. Also, I > wouldn't have written this patch if it hadn't fixed something :) I can > try running piglit with this patch reverted tomorrow, if no one beats > me to it. > Another explanation would be that the original fix was working around a problem in the generator with some virtual instruction that wouldn't handle source/destination overlap correctly -- Or that the original problem was generation-specific. It would definitely be good to have some test-case reproducing the problem. >> >>> + add_reg_interference(g, inst->dst, inst->src[i], >>> first_payload_node); >>> } >>> } >>> } >>> -- >>> 2.5.0 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev