On Thu, 2015-07-30 at 16:33 +0300, Francisco Jerez wrote: > Francisco Jerez <curroje...@riseup.net> writes: > > > Iago Toral Quiroga <ito...@igalia.com> writes: > > > >> Previous patches made it so that we do not need to unspill the same vgrf > >> with every instruction as long as these instructions come right after > >> the register was spilled or unspilled. This means that actually spilling > >> the register is now cheaper in these scenarios, so adjust the spilling > >> cost calculation accordingly. > >> --- > >> .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 61 > >> +++++++++++++++++++--- > >> 1 file changed, 55 insertions(+), 6 deletions(-) > >> > >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > >> b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > >> index 9a69fac..652a68c 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > >> @@ -269,6 +269,21 @@ vec4_visitor::evaluate_spill_costs(float > >> *spill_costs, bool *no_spill) > >> { > >> float loop_scale = 1.0; > >> > >> + /* If a spilled register is written (or unspilled) and then > >> immediately read, > >> + * we will avoid to emit scratch reads for that register for as long as > >> + * consecutive instructions keep reading that register. This makes > >> spilling > >> + * the register cheaper, so we need to account for this here. Since any > >> + * instruction can only have at most 3 src operands we can only have > >> up to > >> + * 3 cached registers (i.e. registers that we do not need to unspill > >> if they > >> + * are read in a consecutive instruction) at any given time. > >> + * > >> + * We keep two lists, cached[0] has the registers that are cached > >> before > >> + * the current instruction, and cached[1] has the registers that are > >> + * cached after the current instruction. We copy cached[1] into > >> cached[0] > >> + * after processing each instruction. > >> + */ > >> + int cached[2][3] = { -1, -1, -1, -1, -1, -1 }; > >> + > > > > Wouldn't it be easier to drop the array and just compare the sources of > > each instruction with the previous one? > > > >> for (unsigned i = 0; i < this->alloc.count; i++) { > >> spill_costs[i] = 0.0; > >> no_spill[i] = alloc.sizes[i] != 1; > >> @@ -279,18 +294,52 @@ vec4_visitor::evaluate_spill_costs(float > >> *spill_costs, bool *no_spill) > >> * loops run 10 times. > >> */ > >> foreach_block_and_inst(block, vec4_instruction, inst, cfg) { > >> + memset(cached[1], -1, sizeof(cached[1])); > >> + > >> for (unsigned int i = 0; i < 3; i++) { > >> - if (inst->src[i].file == GRF) { > >> - spill_costs[inst->src[i].reg] += loop_scale; > >> - assert(!inst->src[i].reladdr); > >> - } > >> + if (inst->src[i].file == GRF) { > >> + /* Don't increase spilling cost if src[i] isn't going > >> + * to be unspilled > >> + */ > >> + int slot; > >> + for (slot = 0; slot < 3; slot++) { > >> + if (inst->src[i].reg == cached[0][slot]) > > > > I guess this should take the reg_offset and writemask of the cached > > register into account. I suggest you use backend_reg::in_range() to > > check if there is overlap. > > > > It also doesn't look like this handles the case in which the same > register is unspilled several times for the same instruction correctly, > as you've implemented it in PATCH 1.
True, I'll fix that and have a go at your other suggestions as well. Iago > >> + break; > >> + } > >> + if (slot == 3) { > >> + /* Not cached, we will unspill src[i] */ > >> + spill_costs[inst->src[i].reg] += loop_scale; > >> + assert(!inst->src[i].reladdr); > >> + } else { > >> + /* It is cached. Since it is read in this instruction it > >> + * stays cached for the next. > >> + */ > >> + cached[1][slot] = inst->src[i].reg; > >> + } > >> + } > >> } > >> > >> if (inst->dst.file == GRF) { > >> - spill_costs[inst->dst.reg] += loop_scale; > >> - assert(!inst->dst.reladdr); > >> + int slot; > >> + /* Mark this reg as cached because if the next instruction reads > >> it > >> + * we won't unspill it. > >> + */ > >> + for (slot = 0; slot < 3; slot++) { > >> + if (cached[1][slot] == -1) { > >> + cached[1][slot] = inst->dst.reg; > >> + break; > >> + } > >> + } > >> + assert(slot < 3); > >> + spill_costs[inst->dst.reg] += loop_scale; > >> + assert(!inst->dst.reladdr); > >> } > >> > >> + /* Cached registers after this instruction are the cached registers > >> + * for the next instruction > >> + */ > >> + memcpy(cached[0], cached[1], sizeof(cached[0])); > >> + > >> switch (inst->opcode) { > >> > >> case BRW_OPCODE_DO: > >> -- > >> 1.9.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev