On Thu, 2013-06-20 at 03:16 -0700, Stephen Clarke wrote:
> > why (and where) did ivopts decide to move the post-increments above the
> > usages in the first loop?
>
> It looks rather like the transformation described next to
> tree-ssa-loop-ivopts.c/adjust_iv_update_pos.
Yes, that looks like the place. Unfortunately, at that point everything
is controlled by the cost of the various IV choices and I think that is
what I need to change. Just to explain what is happening, the MIPS
ProAptiv chip has what is known as 'memory bonding' where two sequential
4 byte loads can be handled as one 8 byte load (if the alignment is
right). So I want to keep all the loads in a sequence and with no
intervening instructions in order to improve the chances of memory
bonding happening. With a memcpy style loop and loop unrolling the IV
optimization is changing my ideal loop:
.L4:
addiu $8,$8,8
lw $14,0($7)
lw $13,4($7)
lw $12,8($7)
lw $11,12($7)
lw $15,16($7)
lw $24,20($7)
lw $5,24($7)
lw $25,28($7)
addiu $7,$7,32
sw $14,0($3)
sw $13,4($3)
sw $12,8($3)
sw $11,12($3)
sw $15,16($3)
sw $24,20($3)
sw $5,24($3)
sw $25,28($3)
bne $8,$6,.L4
addiu $3,$3,32
into this:
.L4:
addiu $8,$8,8
lw $13,0($7)
lw $12,4($7)
lw $14,8($7)
lw $15,12($7)
lw $24,16($7)
lw $5,20($7)
lw $25,24($7)
addiu $7,$7,32
sw $13,0($3)
sw $12,4($3)
sw $14,8($3)
sw $15,12($3)
sw $24,16($3)
sw $5,20($3)
sw $25,24($3)
addiu $3,$3,32
lw $4,-4($7)
bne $8,$6,.L4
sw $4,-4($3)
And so I lose one of my bonding opportunities because of the lw/sw that
are done after incrementing registers $7 and $3. I think what I need to
fix this is a target specific way to modify the cost calculations before
the IV variables are chosen.
Steve Ellcey
[email protected]