vec4: Don't emit scratch reads for a spilled register we have just written

Iago Toral Thu, 30 Jul 2015 23:31:24 -0700

On Thu, 2015-07-30 at 17:14 +0300, Francisco Jerez wrote:
> Francisco Jerez <[email protected]> writes:
> 
> > Iago Toral Quiroga <[email protected]> writes:
> >
> >> When we have code such as this:
> >>
> >> mov vgrf1.0.x:F, vgrf2.xxxx:F
> >> mov vgrf3.0.x:F, vgrf1.xxxx:F
> >> ...
> >> mov vgrf3.0.x:F, vgrf1.xxxx:F
> >>
> >> And vgrf1 is chosen for spilling, we can emit this:
> >>
> >> mov vgrf1.0.x:F, vgrf2.xxxx:F
> >> gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D
> >> mov vgrf3.0.x:F, vgrf1.xxxx:F
> >> ...
> >> gen4_scratch_read vgrf4.0.x:F, 22D
> >> mov vgrf3.0.x:F, vgrf4.xxxx:F
> >>
> >> Instead of this:
> >>
> >> mov vgrf1.0.x:F, vgrf2.xxxx:F
> >> gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D
> >> gen4_scratch_read vgrf4.0.x:F, 22D
> >> mov vgrf3.0.x:F, vgrf4.xxxx:F
> >> ...
> >> gen4_scratch_read vgrf5.0.x:F, 22D
> >> mov vgrf3.0.x:F, vgrf5.xxxx:F
> >>
> >> And save one scratch read while still preserving the benefits of
> >> spilling the register.
> >>
> >> In general, we avoid emitting scratch reads for as long as the next 
> >> instruction
> >> keeps reading the spilled register. This should not harm the benefit of
> >> spilling the register because gains for register allocation only come when 
> >> we
> >> have chunks of program code where the register is alive but not really used
> >> (because these are the points where we could effectively use that register 
> >> for
> >> another purpose if we spilled it), so as long as consecutive instructions 
> >> use
> >> that register we can avoid the scratch reads without losing anything.
> >> ---
> >>  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp     | 37 
> >> +++++++++++++++++++++-
> >>  1 file changed, 36 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
> >> b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> index cff5406..fd56dae 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
> >>     unsigned int spill_offset = last_scratch++;
> >>  
> >>     /* Generate spill/unspill instructions for the objects being spilled. 
> >> */
> >> +   vec4_instruction *spill_write_inst = NULL;
> >>     foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
> >> +      /* We don't spill registers used for scratch */
> >> +      if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
> >> +          inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
> >> +         continue;
> >> +
> >>        int scratch_reg = -1;
> >> +      bool spill_reg_was_read = false;
> >>        for (unsigned int i = 0; i < 3; i++) {
> >>           if (inst->src[i].file == GRF && inst->src[i].reg == 
> >> spill_reg_nr) {
> >> -            if (scratch_reg == -1) {
> >> +            if (!spill_reg_was_read) {
> >> +               spill_reg_was_read = (!inst->predicate ||
> >> +                                     inst->opcode == BRW_OPCODE_SEL);
> >> +            }
> >> +
> >> +            /* If we are reading the spilled register right after writing
> >> +             * to it we can skip the scratch read and use directly the
> >> +             * register we used as source for the scratch write. For this
> >> +             * to work we must check that:
> >> +             *
> >> +             * 1) The write is inconditional, that is, it is not 
> >> predicated or
> >> +             *    it is a SEL.
> >> +             * 2) All the channels that we read have been written in that
> >> +             *    last write instruction.
> >> +             *
> >> +             * We keep doing this for as long as the next instruction
> >> +             * keeps reading the spilled register and break as soon as we
> >> +             * find an instruction that doesn't.
> >> +             */
> >> +            if (spill_write_inst &&
> >> +                (!spill_write_inst->predicate ||
> >> +                 spill_write_inst->opcode == BRW_OPCODE_SEL) &&
> >> +                ((brw_mask_for_swizzle(inst->src[i].swizzle) &
> >> +                 ~spill_write_inst->dst.writemask) == 0)) {
> >> +               scratch_reg = spill_write_inst->dst.reg;
> >> +            } else if (scratch_reg == -1) {
> >
> > One suggestion: You could factor out the rather complex caching logic
> > into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
> > vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
> > would simply compare scratch_reg with the sources of the current
> > instruction (up to src) and the sources and destination of the previous
> 
> Huh, of course "up to i" is what I intended to write.


Up to i? Shouldn't we just check source i only? The way I understand
this is that we would have a loop for each source in the current
instruction and call this function for each one to see if it is cached.

> > non-scratch_read/write instruction.  If there's a match it would check
> > that the regioning is compatible with the i-th source and return true in
> > that case.  This would have several benefits:
> >
> >  - It would keep ::spill_reg() simple and the caching heuristic factored
> >    out.  The only thing spill_reg() would still need to take care of is
> >    keep track of the last spilling temporary (e.g. as you're doing it
> >    now with the scratch_reg variable) and pass it to
> >    can_reuse_scratch_for_source(), but it would no longer be necessary
> >    to reset it to -1 when reuse is not possible.
> >
> >  - It would allow you to use a single implementation of the caching
> >    policy to implement both spill_reg() and evaluate_spill_costs(), what
> >    would make sure that things don't go out of sync in the likely case
> >    that we want to change the caching heuristic in the future.
> >
> In case it's not obvious from my previous explanation
> evaluate_spill_costs() could invoke the function I proposed like
> "can_reuse_scratch_for_source(inst, i, inst->src[i].reg)" to find out if
> the register would be cached by spill_reg().

Sounds like a good idea, I'll give it a try. Thanks for having a look at
the patches by the way.

Iago

> >>                 scratch_reg = alloc.allocate(1);
> >>                 src_reg temp = inst->src[i];
> >>                 temp.reg = scratch_reg;
> >> @@ -357,6 +389,9 @@ vec4_visitor::spill_reg(int spill_reg_nr)
> >>  
> >>        if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) {
> >>           emit_scratch_write(block, inst, spill_offset);
> >> +         spill_write_inst = inst;
> >> +      } else if (spill_write_inst && !spill_reg_was_read) {
> >> +         spill_write_inst = NULL;
> >>        }
> >>     }
> >>  
> >> -- 
> >> 1.9.1


_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

Reply via email to