Iago Toral Quiroga <ito...@igalia.com> writes: > When we have code such as this: > > mov vgrf1.0.x:F, vgrf2.xxxx:F > mov vgrf3.0.x:F, vgrf1.xxxx:F > ... > mov vgrf3.0.x:F, vgrf1.xxxx:F > > And vgrf1 is chosen for spilling, we can emit this: > > mov vgrf1.0.x:F, vgrf2.xxxx:F > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D > mov vgrf3.0.x:F, vgrf1.xxxx:F > ... > gen4_scratch_read vgrf4.0.x:F, 22D > mov vgrf3.0.x:F, vgrf4.xxxx:F > > Instead of this: > > mov vgrf1.0.x:F, vgrf2.xxxx:F > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D > gen4_scratch_read vgrf4.0.x:F, 22D > mov vgrf3.0.x:F, vgrf4.xxxx:F > ... > gen4_scratch_read vgrf5.0.x:F, 22D > mov vgrf3.0.x:F, vgrf5.xxxx:F > > And save one scratch read while still preserving the benefits of > spilling the register. > > In general, we avoid emitting scratch reads for as long as the next > instruction > keeps reading the spilled register. This should not harm the benefit of > spilling the register because gains for register allocation only come when we > have chunks of program code where the register is alive but not really used > (because these are the points where we could effectively use that register for > another purpose if we spilled it), so as long as consecutive instructions use > that register we can avoid the scratch reads without losing anything. > --- > .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 > +++++++++++++++++++++- > 1 file changed, 36 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > index cff5406..fd56dae 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr) > unsigned int spill_offset = last_scratch++; > > /* Generate spill/unspill instructions for the objects being spilled. */ > + vec4_instruction *spill_write_inst = NULL; > foreach_block_and_inst(block, vec4_instruction, inst, cfg) { > + /* We don't spill registers used for scratch */ > + if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ || > + inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) > + continue; > + > int scratch_reg = -1; > + bool spill_reg_was_read = false; > for (unsigned int i = 0; i < 3; i++) { > if (inst->src[i].file == GRF && inst->src[i].reg == spill_reg_nr) { > - if (scratch_reg == -1) { > + if (!spill_reg_was_read) { > + spill_reg_was_read = (!inst->predicate || > + inst->opcode == BRW_OPCODE_SEL); > + } > + > + /* If we are reading the spilled register right after writing > + * to it we can skip the scratch read and use directly the > + * register we used as source for the scratch write. For this > + * to work we must check that: > + * > + * 1) The write is inconditional, that is, it is not predicated > or > + * it is a SEL. > + * 2) All the channels that we read have been written in that > + * last write instruction. > + * > + * We keep doing this for as long as the next instruction > + * keeps reading the spilled register and break as soon as we > + * find an instruction that doesn't. > + */ > + if (spill_write_inst && > + (!spill_write_inst->predicate || > + spill_write_inst->opcode == BRW_OPCODE_SEL) && > + ((brw_mask_for_swizzle(inst->src[i].swizzle) & > + ~spill_write_inst->dst.writemask) == 0)) { > + scratch_reg = spill_write_inst->dst.reg; > + } else if (scratch_reg == -1) {
One suggestion: You could factor out the rather complex caching logic into a separate function (e.g. 'bool can_reuse_scratch_for_source(const vec4_instruction *, unsigned i, unsigned scratch_reg)'). The function would simply compare scratch_reg with the sources of the current instruction (up to src) and the sources and destination of the previous non-scratch_read/write instruction. If there's a match it would check that the regioning is compatible with the i-th source and return true in that case. This would have several benefits: - It would keep ::spill_reg() simple and the caching heuristic factored out. The only thing spill_reg() would still need to take care of is keep track of the last spilling temporary (e.g. as you're doing it now with the scratch_reg variable) and pass it to can_reuse_scratch_for_source(), but it would no longer be necessary to reset it to -1 when reuse is not possible. - It would allow you to use a single implementation of the caching policy to implement both spill_reg() and evaluate_spill_costs(), what would make sure that things don't go out of sync in the likely case that we want to change the caching heuristic in the future. > scratch_reg = alloc.allocate(1); > src_reg temp = inst->src[i]; > temp.reg = scratch_reg; > @@ -357,6 +389,9 @@ vec4_visitor::spill_reg(int spill_reg_nr) > > if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) { > emit_scratch_write(block, inst, spill_offset); > + spill_write_inst = inst; > + } else if (spill_write_inst && !spill_reg_was_read) { > + spill_write_inst = NULL; > } > } > > -- > 1.9.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev