Francisco Jerez <curroje...@riseup.net> writes: > Iago Toral Quiroga <ito...@igalia.com> writes: > >> When we have code such as this: >> >> mov vgrf1.0.x:F, vgrf2.xxxx:F >> mov vgrf3.0.x:F, vgrf1.xxxx:F >> ... >> mov vgrf3.0.x:F, vgrf1.xxxx:F >> >> And vgrf1 is chosen for spilling, we can emit this: >> >> mov vgrf1.0.x:F, vgrf2.xxxx:F >> gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D >> mov vgrf3.0.x:F, vgrf1.xxxx:F >> ... >> gen4_scratch_read vgrf4.0.x:F, 22D >> mov vgrf3.0.x:F, vgrf4.xxxx:F >> >> Instead of this: >> >> mov vgrf1.0.x:F, vgrf2.xxxx:F >> gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D >> gen4_scratch_read vgrf4.0.x:F, 22D >> mov vgrf3.0.x:F, vgrf4.xxxx:F >> ... >> gen4_scratch_read vgrf5.0.x:F, 22D >> mov vgrf3.0.x:F, vgrf5.xxxx:F >> >> And save one scratch read while still preserving the benefits of >> spilling the register. >> >> In general, we avoid emitting scratch reads for as long as the next >> instruction >> keeps reading the spilled register. This should not harm the benefit of >> spilling the register because gains for register allocation only come when we >> have chunks of program code where the register is alive but not really used >> (because these are the points where we could effectively use that register >> for >> another purpose if we spilled it), so as long as consecutive instructions use >> that register we can avoid the scratch reads without losing anything. >> --- >> .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 >> +++++++++++++++++++++- >> 1 file changed, 36 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> index cff5406..fd56dae 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp >> @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr) >> unsigned int spill_offset = last_scratch++; >> >> /* Generate spill/unspill instructions for the objects being spilled. */ >> + vec4_instruction *spill_write_inst = NULL; >> foreach_block_and_inst(block, vec4_instruction, inst, cfg) { >> + /* We don't spill registers used for scratch */ >> + if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ || >> + inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) >> + continue; >> + >> int scratch_reg = -1; >> + bool spill_reg_was_read = false; >> for (unsigned int i = 0; i < 3; i++) { >> if (inst->src[i].file == GRF && inst->src[i].reg == spill_reg_nr) { >> - if (scratch_reg == -1) { >> + if (!spill_reg_was_read) { >> + spill_reg_was_read = (!inst->predicate || >> + inst->opcode == BRW_OPCODE_SEL); >> + } >> + >> + /* If we are reading the spilled register right after writing >> + * to it we can skip the scratch read and use directly the >> + * register we used as source for the scratch write. For this >> + * to work we must check that: >> + * >> + * 1) The write is inconditional, that is, it is not predicated >> or >> + * it is a SEL. >> + * 2) All the channels that we read have been written in that >> + * last write instruction. >> + * >> + * We keep doing this for as long as the next instruction >> + * keeps reading the spilled register and break as soon as we >> + * find an instruction that doesn't. >> + */ >> + if (spill_write_inst && >> + (!spill_write_inst->predicate || >> + spill_write_inst->opcode == BRW_OPCODE_SEL) && >> + ((brw_mask_for_swizzle(inst->src[i].swizzle) & >> + ~spill_write_inst->dst.writemask) == 0)) { >> + scratch_reg = spill_write_inst->dst.reg; >> + } else if (scratch_reg == -1) { > > One suggestion: You could factor out the rather complex caching logic > into a separate function (e.g. 'bool can_reuse_scratch_for_source(const > vec4_instruction *, unsigned i, unsigned scratch_reg)'). The function > would simply compare scratch_reg with the sources of the current > instruction (up to src) and the sources and destination of the previous
Huh, of course "up to i" is what I intended to write. > non-scratch_read/write instruction. If there's a match it would check > that the regioning is compatible with the i-th source and return true in > that case. This would have several benefits: > > - It would keep ::spill_reg() simple and the caching heuristic factored > out. The only thing spill_reg() would still need to take care of is > keep track of the last spilling temporary (e.g. as you're doing it > now with the scratch_reg variable) and pass it to > can_reuse_scratch_for_source(), but it would no longer be necessary > to reset it to -1 when reuse is not possible. > > - It would allow you to use a single implementation of the caching > policy to implement both spill_reg() and evaluate_spill_costs(), what > would make sure that things don't go out of sync in the likely case > that we want to change the caching heuristic in the future. > In case it's not obvious from my previous explanation evaluate_spill_costs() could invoke the function I proposed like "can_reuse_scratch_for_source(inst, i, inst->src[i].reg)" to find out if the register would be cached by spill_reg(). >> scratch_reg = alloc.allocate(1); >> src_reg temp = inst->src[i]; >> temp.reg = scratch_reg; >> @@ -357,6 +389,9 @@ vec4_visitor::spill_reg(int spill_reg_nr) >> >> if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) { >> emit_scratch_write(block, inst, spill_offset); >> + spill_write_inst = inst; >> + } else if (spill_write_inst && !spill_reg_was_read) { >> + spill_write_inst = NULL; >> } >> } >> >> -- >> 1.9.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev