Hi Jeff,
However, that doesn't work for the second example:
...
beq $3,$0,$L14
nop
$L7:
andi $2,$2,0xffff
...
bne $3,$0,$L7
nop
$L14:
andi $2,$2,0xffff
...
...
What is different from the first example, is that here the beq owns
neither the
fall-through thread ($L7) nor the target thread ($L14). Same for the
bne. In the
first example, the jump owns both threads.
we can think of this transformation:
...
beq $3,$0,$L14new
$L7:
andi $2,$2,0xffff
...
bne $3,$0,$L7
nop
andi $2,$2,0xffff
$L14new:
Could you instead make it:
beq $3,$0,$L14a
andi $2,$2,0xffff
$L7:
andi $2,$2,0xffff
...
bne $3,$0,$L7
nop
$L14:
andi $2,$2,0xffff
L$14a:
...
[ Copy the insn from the L14 target into the delay slot of first
branch. ]
That is indeed possible in this specific example, because executing
'andi $2,$2, 0xffff' once more does not change the value of $2, but that
does
not always work (f.i., not for addi $2,$2,1). This might be an ok
intermediate solution though, thanks for the idea.
Step #2
beq $3,$0,$L14a
andi $2,$2,0xffff
$L7:
andi $2,$2,0xffff
$L7a:
...
bne $3,$0,$L7a
andi $2,$2,0xffff
$L14:
andi $2,$2,0xffff
L$14a:
...
Same transformation copying the insn from the L7 target into the delay
slot of the second branch.
Then after reorg has completed (so you don't have to teach reorg about
code labels in sequences), squish the redundant insns together and
insert the code label into the SEQUENCE resulting in
beq $3,$0,$L14a
$L7:
andi $2,$2,0xffff
$L7a:
...
bne $3,$0,$L7a
$L14:
andi $2,$2,0xffff
L$14a:
...
You'd still have to deal with fallout of code labels in sequences
post-reorg, so maybe it's not that big of a win to delay having the
code label appear in the sequence until after reorg.c has completed.
Right.
The other question I'd ask is what's the real penalty these days in
not filling hte slots? I know that on later out-of-order PA chips
filling slots was barely worth the effort, I guess it's still
profitable on the low-end embedded MIPS chips?
About the penalty, I don't really know. But since the optimization is
both filling delay slots and removing
duplicate code, it looks like a good idea to me.
Thanks,
- Tom