Iago Toral <ito...@igalia.com> writes: > On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote: >> Iago Toral Quiroga <ito...@igalia.com> writes: >> >> > When we have code such as this: >> > >> > mov vgrf1.0.x:F, vgrf2.xxxx:F >> > mov vgrf3.0.x:F, vgrf1.xxxx:F >> > ... >> > mov vgrf3.0.x:F, vgrf1.xxxx:F >> > >> > And vgrf1 is chosen for spilling, we can emit this: >> > >> > mov vgrf1.0.x:F, vgrf2.xxxx:F >> > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D >> > mov vgrf3.0.x:F, vgrf1.xxxx:F >> > ... >> > gen4_scratch_read vgrf4.0.x:F, 22D >> > mov vgrf3.0.x:F, vgrf4.xxxx:F >> > >> > Instead of this: >> > >> > mov vgrf1.0.x:F, vgrf2.xxxx:F >> > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D >> > gen4_scratch_read vgrf4.0.x:F, 22D >> > mov vgrf3.0.x:F, vgrf4.xxxx:F >> > ... >> > gen4_scratch_read vgrf5.0.x:F, 22D >> > mov vgrf3.0.x:F, vgrf5.xxxx:F >> > >> > And save one scratch read while still preserving the benefits of >> > spilling the register. >> > >> > In general, we avoid emitting scratch reads for as long as the next >> > instruction >> > keeps reading the spilled register. This should not harm the benefit of >> > spilling the register because gains for register allocation only come when >> > we >> > have chunks of program code where the register is alive but not really used >> > (because these are the points where we could effectively use that register >> > for >> > another purpose if we spilled it), so as long as consecutive instructions >> > use >> > that register we can avoid the scratch reads without losing anything. >> > --- >> > .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 >> > +++++++++++++++++++++- >> > 1 file changed, 36 insertions(+), 1 deletion(-) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> > index cff5406..fd56dae 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr) >> > unsigned int spill_offset = last_scratch++; >> > >> > /* Generate spill/unspill instructions for the objects being spilled. >> > */ >> > + vec4_instruction *spill_write_inst = NULL; >> > foreach_block_and_inst(block, vec4_instruction, inst, cfg) { >> > + /* We don't spill registers used for scratch */ >> > + if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ || >> > + inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) >> > + continue; >> > + >> > int scratch_reg = -1; >> > + bool spill_reg_was_read = false; >> > for (unsigned int i = 0; i < 3; i++) { >> > if (inst->src[i].file == GRF && inst->src[i].reg == >> > spill_reg_nr) { >> > - if (scratch_reg == -1) { >> > + if (!spill_reg_was_read) { >> > + spill_reg_was_read = (!inst->predicate || >> > + inst->opcode == BRW_OPCODE_SEL); >> > + } >> > + >> > + /* If we are reading the spilled register right after writing >> > + * to it we can skip the scratch read and use directly the >> > + * register we used as source for the scratch write. For this >> > + * to work we must check that: >> > + * >> > + * 1) The write is inconditional, that is, it is not >> > predicated or >> > + * it is a SEL. >> > + * 2) All the channels that we read have been written in that >> > + * last write instruction. >> > + * >> > + * We keep doing this for as long as the next instruction >> > + * keeps reading the spilled register and break as soon as we >> > + * find an instruction that doesn't. >> > + */ >> > + if (spill_write_inst && >> > + (!spill_write_inst->predicate || >> > + spill_write_inst->opcode == BRW_OPCODE_SEL) && >> > + ((brw_mask_for_swizzle(inst->src[i].swizzle) & >> > + ~spill_write_inst->dst.writemask) == 0)) { >> > + scratch_reg = spill_write_inst->dst.reg; >> > + } else if (scratch_reg == -1) { >> >> One suggestion: You could factor out the rather complex caching logic >> into a separate function (e.g. 'bool can_reuse_scratch_for_source(const >> vec4_instruction *, unsigned i, unsigned scratch_reg)'). The function >> would simply compare scratch_reg with the sources of the current >> instruction (up to src) and the sources and destination of the previous >> non-scratch_read/write instruction. If there's a match it would check >> that the regioning is compatible with the i-th source and return true in >> that case. This would have several benefits: > > I am not sure I get the part about checking regioning with the i-th > source. inst->src[i] does not reference scratch_reg (at least not if we > got here from spill_reg) so it cannot overlap with anything that > references scratch_reg... I am going to send a v3 with the > implementation of can_use_scratch_for_source in a few minutes, so if I > am just missing the point I guess it is probably be easier if you just > reply to that patch. > I think you did get the point, what I meant was to check all sources of the current instruction less than i, which seems to be exactly what you've implemented in v3.
>> - It would keep ::spill_reg() simple and the caching heuristic factored >> out. The only thing spill_reg() would still need to take care of is >> keep track of the last spilling temporary (e.g. as you're doing it >> now with the scratch_reg variable) and pass it to >> can_reuse_scratch_for_source(), but it would no longer be necessary >> to reset it to -1 when reuse is not possible. >> >> - It would allow you to use a single implementation of the caching >> policy to implement both spill_reg() and evaluate_spill_costs(), what >> would make sure that things don't go out of sync in the likely case >> that we want to change the caching heuristic in the future. >> >> > scratch_reg = alloc.allocate(1); >> > src_reg temp = inst->src[i]; >> > temp.reg = scratch_reg; >> > @@ -357,6 +389,9 @@ vec4_visitor::spill_reg(int spill_reg_nr) >> > >> > if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) { >> > emit_scratch_write(block, inst, spill_offset); >> > + spill_write_inst = inst; >> > + } else if (spill_write_inst && !spill_reg_was_read) { >> > + spill_write_inst = NULL; >> > } >> > } >> > >> > -- >> > 1.9.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev