Iago Toral Quiroga <ito...@igalia.com> writes: > Previous patches made it so that we do not need to unspill the same vgrf > with every instruction as long as these instructions come right after > the register was spilled or unspilled. This means that actually spilling > the register is now cheaper in these scenarios, so adjust the spilling > cost calculation accordingly. > --- > .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 61 > +++++++++++++++++++--- > 1 file changed, 55 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > index 9a69fac..652a68c 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp > @@ -269,6 +269,21 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, > bool *no_spill) > { > float loop_scale = 1.0; > > + /* If a spilled register is written (or unspilled) and then immediately > read, > + * we will avoid to emit scratch reads for that register for as long as > + * consecutive instructions keep reading that register. This makes > spilling > + * the register cheaper, so we need to account for this here. Since any > + * instruction can only have at most 3 src operands we can only have up to > + * 3 cached registers (i.e. registers that we do not need to unspill if > they > + * are read in a consecutive instruction) at any given time. > + * > + * We keep two lists, cached[0] has the registers that are cached before > + * the current instruction, and cached[1] has the registers that are > + * cached after the current instruction. We copy cached[1] into cached[0] > + * after processing each instruction. > + */ > + int cached[2][3] = { -1, -1, -1, -1, -1, -1 }; > +
Wouldn't it be easier to drop the array and just compare the sources of each instruction with the previous one? > for (unsigned i = 0; i < this->alloc.count; i++) { > spill_costs[i] = 0.0; > no_spill[i] = alloc.sizes[i] != 1; > @@ -279,18 +294,52 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, > bool *no_spill) > * loops run 10 times. > */ > foreach_block_and_inst(block, vec4_instruction, inst, cfg) { > + memset(cached[1], -1, sizeof(cached[1])); > + > for (unsigned int i = 0; i < 3; i++) { > - if (inst->src[i].file == GRF) { > - spill_costs[inst->src[i].reg] += loop_scale; > - assert(!inst->src[i].reladdr); > - } > + if (inst->src[i].file == GRF) { > + /* Don't increase spilling cost if src[i] isn't going > + * to be unspilled > + */ > + int slot; > + for (slot = 0; slot < 3; slot++) { > + if (inst->src[i].reg == cached[0][slot]) I guess this should take the reg_offset and writemask of the cached register into account. I suggest you use backend_reg::in_range() to check if there is overlap. > + break; > + } > + if (slot == 3) { > + /* Not cached, we will unspill src[i] */ > + spill_costs[inst->src[i].reg] += loop_scale; > + assert(!inst->src[i].reladdr); > + } else { > + /* It is cached. Since it is read in this instruction it > + * stays cached for the next. > + */ > + cached[1][slot] = inst->src[i].reg; > + } > + } > } > > if (inst->dst.file == GRF) { > - spill_costs[inst->dst.reg] += loop_scale; > - assert(!inst->dst.reladdr); > + int slot; > + /* Mark this reg as cached because if the next instruction reads it > + * we won't unspill it. > + */ > + for (slot = 0; slot < 3; slot++) { > + if (cached[1][slot] == -1) { > + cached[1][slot] = inst->dst.reg; > + break; > + } > + } > + assert(slot < 3); > + spill_costs[inst->dst.reg] += loop_scale; > + assert(!inst->dst.reladdr); > } > > + /* Cached registers after this instruction are the cached registers > + * for the next instruction > + */ > + memcpy(cached[0], cached[1], sizeof(cached[0])); > + > switch (inst->opcode) { > > case BRW_OPCODE_DO: > -- > 1.9.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev