Re: Handling labels in delay-slot scheduling

Jeff Law Thu, 18 Nov 2010 10:21:30 -0800

On 11/18/10 10:31, Tom de Vries wrote:

I'm working on improving delay-slot scheduling and would appreciateadvice on a
problem I encountered.

Oh boy....

The problem is: how to add support for placing a CODE_LABEL on aninstruction in
a delay slot?
My impression is that this is not supported currently. One way toimplement thiswould be to allow labels in the sequence insns which represent thedelay slots.Another way could be to keep some state external to the rtlrepresentation that
indicates the presence of a label.

It's not currently supported. While you can certainly add the label tothe SEQUENCE I suspect other surgery will be necessary.

To illustrate why I think that would be useful, let's look at 2related examples
of MIPS code, for which delay slot filling is currently not done.

Note: The MIPS has a single delay slot, possibly annulling (annulling
jumps are called branch likely insns for MIPS).

The first example looks like this:
...
    beq    $2,$0,$L5
    nop
    lw    $3,4($4)
    addiu    $2,$2,1
    ...
$L5:
    addiu    $2,$2,1
        ...
...
where the beq owns the target thread $L5, in other words the beq isthe onlyway into $L5. Note that the beq also owns the fall-through thread(starting at
the lw insn).
The duplicate insn 'addiu $2,$2,1' can be hoisted into the delay slot.Thisalready happens when branch likely insns are enabled. The mechanismworks as
follows: first the code is transformed into:
...
    beql    $2,$0,$L5
    addiu    $2,$2,1
    lw    $3,4($4)
    addiu    $2,$2,1
        ...
$L5:
    ...
...
using an annulling jump (beql).

and only then into:
...
    beq    $2,$0,$L5
    addiu    $2,$2,1
    lw    $3,4($4)
        ...
$L5:
    ...
...
by try_merge_delay_insns.

Right.   Clearly when the branch owns the target is a much easier case.

A problem with newer MIPSes is that the branch likely instruction has a
performance penalty, and is deprecated. However, if we disable thebranch likely
instruction, the transformation above is not happening anymore.

There's been a penalty for these kinds of instructions on every targetI've worked on -- largely because the "nullification" occurs in the laststage of the pipeline. That's why we generally try to avoid multi-cycleinsns in nullified delay slots. Anyway, back to the topic at hand...

I wrote some code that detects in this case the duplicate, andimplements thetransformation by deleting the insn in the fallthrough thread andimporting theother insn into the delay slot. This transformation happensindependently from
branch likely insns, and it happens in a single step.

However, that doesn't work for the second example:
...
    beq    $3,$0,$L14
    nop
$L7:
    andi    $2,$2,0xffff
    ...
    bne    $3,$0,$L7
    nop
$L14:
    andi    $2,$2,0xffff
    ...
...
What is different from the first example, is that here the beq ownsneither thefall-through thread ($L7) nor the target thread ($L14). Same for thebne. In the
first example, the jump owns both threads.

we can think of this transformation:
...
    beq    $3,$0,$L14new
$L7:
    andi    $2,$2,0xffff
    ...
    bne    $3,$0,$L7
    nop
    andi    $2,$2,0xffff
$L14new:

Could you instead make it:

    beq    $3,$0,$L14a
    andi    $2,$2,0xffff
$L7:
    andi    $2,$2,0xffff
    ...
    bne    $3,$0,$L7
    nop
$L14:
    andi    $2,$2,0xffff
L$14a:
    ...

[ Copy the insn from the L14 target into the delay slot of first branch. ]

Step #2

    beq    $3,$0,$L14a
    andi    $2,$2,0xffff
$L7:
    andi    $2,$2,0xffff
$L7a:
    ...
    bne    $3,$0,$L7a
    andi    $2,$2,0xffff
$L14:
    andi    $2,$2,0xffff
L$14a:
    ...

Same transformation copying the insn from the L7 target into the delayslot of the second branch.

Then after reorg has completed (so you don't have to teach reorg aboutcode labels in sequences), squish the redundant insns together andinsert the code label into the SEQUENCE resulting in


    beq    $3,$0,$L14a
$L7:
    andi    $2,$2,0xffff
$L7a:
    ...
    bne    $3,$0,$L7a
$L14:
    andi    $2,$2,0xffff
L$14a:
    ...

You'd still have to deal with fallout of code labels in sequencespost-reorg, so maybe it's not that big of a win to delay having the codelabel appear in the sequence until after reorg.c has completed.

The other question I'd ask is what's the real penalty these days in notfilling hte slots? I know that on later out-of-order PA chips fillingslots was barely worth the effort, I guess it's still profitable on thelow-end embedded MIPS chips?


jeff

Re: Handling labels in delay-slot scheduling

Reply via email to