Francisco Jerez <curroje...@riseup.net> writes: > Iago Toral <ito...@igalia.com> writes: > >> On Fri, 2015-07-31 at 13:12 +0300, Francisco Jerez wrote: >>> Iago Toral <ito...@igalia.com> writes: >>> >>> > On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote: >>> >> Iago Toral Quiroga <ito...@igalia.com> writes: >>> >> >>> >> > When we have code such as this: >>> >> > >>> >> > mov vgrf1.0.x:F, vgrf2.xxxx:F >>> >> > mov vgrf3.0.x:F, vgrf1.xxxx:F >>> >> > ... >>> >> > mov vgrf3.0.x:F, vgrf1.xxxx:F >>> >> > >>> >> > And vgrf1 is chosen for spilling, we can emit this: >>> >> > >>> >> > mov vgrf1.0.x:F, vgrf2.xxxx:F >>> >> > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D >>> >> > mov vgrf3.0.x:F, vgrf1.xxxx:F >>> >> > ... >>> >> > gen4_scratch_read vgrf4.0.x:F, 22D >>> >> > mov vgrf3.0.x:F, vgrf4.xxxx:F >>> >> > >>> >> > Instead of this: >>> >> > >>> >> > mov vgrf1.0.x:F, vgrf2.xxxx:F >>> >> > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D >>> >> > gen4_scratch_read vgrf4.0.x:F, 22D >>> >> > mov vgrf3.0.x:F, vgrf4.xxxx:F >>> >> > ... >>> >> > gen4_scratch_read vgrf5.0.x:F, 22D >>> >> > mov vgrf3.0.x:F, vgrf5.xxxx:F >>> >> > >>> >> > And save one scratch read while still preserving the benefits of >>> >> > spilling the register. >>> >> > >>> >> > In general, we avoid emitting scratch reads for as long as the next >>> >> > instruction >>> >> > keeps reading the spilled register. This should not harm the benefit of >>> >> > spilling the register because gains for register allocation only come >>> >> > when we >>> >> > have chunks of program code where the register is alive but not really >>> >> > used >>> >> > (because these are the points where we could effectively use that >>> >> > register for >>> >> > another purpose if we spilled it), so as long as consecutive >>> >> > instructions use >>> >> > that register we can avoid the scratch reads without losing anything. >>> >> > --- >>> >> > .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 >>> >> > +++++++++++++++++++++- >>> >> > 1 file changed, 36 insertions(+), 1 deletion(-) >>> >> > >>> >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >>> >> > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >>> >> > index cff5406..fd56dae 100644 >>> >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >>> >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >>> >> > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr) >>> >> > unsigned int spill_offset = last_scratch++; >>> >> > >>> >> > /* Generate spill/unspill instructions for the objects being >>> >> > spilled. */ >>> >> > + vec4_instruction *spill_write_inst = NULL; >>> >> > foreach_block_and_inst(block, vec4_instruction, inst, cfg) { >>> >> > + /* We don't spill registers used for scratch */ >>> >> > + if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ || >>> >> > + inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) >>> >> > + continue; >>> >> > + >>> >> > int scratch_reg = -1; >>> >> > + bool spill_reg_was_read = false; >>> >> > for (unsigned int i = 0; i < 3; i++) { >>> >> > if (inst->src[i].file == GRF && inst->src[i].reg == >>> >> > spill_reg_nr) { >>> >> > - if (scratch_reg == -1) { >>> >> > + if (!spill_reg_was_read) { >>> >> > + spill_reg_was_read = (!inst->predicate || >>> >> > + inst->opcode == BRW_OPCODE_SEL); >>> >> > + } >>> >> > + >>> >> > + /* If we are reading the spilled register right after >>> >> > writing >>> >> > + * to it we can skip the scratch read and use directly the >>> >> > + * register we used as source for the scratch write. For >>> >> > this >>> >> > + * to work we must check that: >>> >> > + * >>> >> > + * 1) The write is inconditional, that is, it is not >>> >> > predicated or >>> >> > + * it is a SEL. >>> >> > + * 2) All the channels that we read have been written in >>> >> > that >>> >> > + * last write instruction. >>> >> > + * >>> >> > + * We keep doing this for as long as the next instruction >>> >> > + * keeps reading the spilled register and break as soon >>> >> > as we >>> >> > + * find an instruction that doesn't. >>> >> > + */ >>> >> > + if (spill_write_inst && >>> >> > + (!spill_write_inst->predicate || >>> >> > + spill_write_inst->opcode == BRW_OPCODE_SEL) && >>> >> > + ((brw_mask_for_swizzle(inst->src[i].swizzle) & >>> >> > + ~spill_write_inst->dst.writemask) == 0)) { >>> >> > + scratch_reg = spill_write_inst->dst.reg; >>> >> > + } else if (scratch_reg == -1) { >>> >> >>> >> One suggestion: You could factor out the rather complex caching logic >>> >> into a separate function (e.g. 'bool can_reuse_scratch_for_source(const >>> >> vec4_instruction *, unsigned i, unsigned scratch_reg)'). The function >>> >> would simply compare scratch_reg with the sources of the current >>> >> instruction (up to src) and the sources and destination of the previous >>> >> non-scratch_read/write instruction. If there's a match it would check >>> >> that the regioning is compatible with the i-th source and return true in >>> >> that case. This would have several benefits: >>> > >>> > I think this might need to be a bit more complex. The previous inst's >>> > src[i] might read only a subset of the channels that where loaded into >>> > scratch_reg so comparing only against that can lead us to think that we >>> > can't reuse scratch_reg when in fact we can. >>> > >>> > I think the process should be more like we loop back looking at the prev >>> > instruction for as long as the previous instruction reads the >>> > scratch_ref until we find the one that writes to scratch_reg (which must >>> > be a scratch read). At that point we check that the writemask in that >>> > instruction is compatible with our swizzle on src[i], in which case we >>> > can reuse scratch_reg. Does this make sense to you? >>> > >>> The thing is that there may be no scratch read in the program yet if >>> you're being called from evaluate_spill_costs(). I think you want to >>> iterate for as long as the previous instruction reads scratch_reg but >>> doesn't write it -- If after the end of the loop you end up at an >>> instruction that writes, fine, you take the writemask from it, otherwise >>> assume XYZW. >>> >>> Yes, we should probably change spill_reg() to read whole vec4s at once >>> when unspilling a register, reading a subset of the channels doesn't >>> save any memory bandwidth and makes the optimization you're implementing >>> less likely to succeed. >>> >>> > In this case, if we have a successful match, we probably want to keep a >>> > reference to that instruction so that for the next src of the current >>> > instruction (or for the next instruction in our program) we don't have >>> > to repeat that process. >>> > >>> >>> Keep in mind that the assumption behind this optimization is that a >>> given spill candidate is only re-used for a very short sequence of >>> consecutive instructions (otherwise it might in fact be harmful), so the >>> amount of CPU cycles you can save by doing that is likely to be tiny in >>> practice -- Feel free to do it if it's trivial but I wouldn't trade it >>> for any additional complexity in the spill_reg() and >>> evaluate_spill_costs() loops. >>> >>> One last concern I had about this series has to do with the possibility >>> of it leading to an infinite loop in the allocate-spill loop -- I guess >>> that at least in theory it would be possible for a register used *only* >>> in a consecutive sequence of instructions to score quite high if the >>> sequence is part of a region of high register pressure. If the register >>> allocator decides it's the best candidate we have a problem because this >>> heuristic will prevent us from making any progress. I guess the easiest >>> way around this would be to have an array of counters in >>> evaluate_spill_costs() keeping track of the number of individual >>> instructions each register had to be (un)spilled for. At the end of the >>> loop you'd check if it's less or equal to one and in that case mark it >>> as no_spill. (Note that in theory this situation is possible even >>> without your changes AFAICT, just far less likely) >> >> I think this can't happen. If a register is only used in consecutive >> instructions and is selected for spilling, we will at least emit one >> scratch read or write for it (in the first instruction that uses it). >> The code in evaluate_spill_costs will then mark as no_spill any register >> used in a scratch read/write, which will make this register >> automatically not selectable for spilling after the first time it is >> selected. >> > Ah, good point, disregard my comment in that case. > In fact it seems really silly that we emit a scratch write for a spilled variable that is never read again, it might be worth forcing no_spill for those as I suggested anyway in order to avoid that. That said I don't see any reason to change that now because as you say the infinite loop shouldn't happen in practice, so it can always be done as a follow-up.
>>> > Iago >>> > >>> > >>> >> - It would keep ::spill_reg() simple and the caching heuristic factored >>> >> out. The only thing spill_reg() would still need to take care of is >>> >> keep track of the last spilling temporary (e.g. as you're doing it >>> >> now with the scratch_reg variable) and pass it to >>> >> can_reuse_scratch_for_source(), but it would no longer be necessary >>> >> to reset it to -1 when reuse is not possible. >>> >> >>> >> - It would allow you to use a single implementation of the caching >>> >> policy to implement both spill_reg() and evaluate_spill_costs(), what >>> >> would make sure that things don't go out of sync in the likely case >>> >> that we want to change the caching heuristic in the future. >>> >> >>> >> > scratch_reg = alloc.allocate(1); >>> >> > src_reg temp = inst->src[i]; >>> >> > temp.reg = scratch_reg; >>> >> > @@ -357,6 +389,9 @@ vec4_visitor::spill_reg(int spill_reg_nr) >>> >> > >>> >> > if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) { >>> >> > emit_scratch_write(block, inst, spill_offset); >>> >> > + spill_write_inst = inst; >>> >> > + } else if (spill_write_inst && !spill_reg_was_read) { >>> >> > + spill_write_inst = NULL; >>> >> > } >>> >> > } >>> >> > >>> >> > -- >>> >> > 1.9.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev