I have a question involving ivopts and PR 48814, which was a fix for
the post increment operation.  Prior to the fix for PR 48814, MIPS
would generate this loop for strcmp (C code from glibc):

$L4:
        lbu     $3,0($4)
        lbu     $2,0($5)
        addiu   $4,$4,1
        beq     $3,$0,$L7
        addiu   $5,$5,1    # This is a branch delay slot
        beq     $3,$2,$L4
        subu    $2,$3,$2   # This is a branch delay slot (only used after loop)


With the current top-of-tree we now generate:

        addiu   $4,$4,1
$L8:
        lbu     $3,-1($4)
        addiu   $5,$5,1
        beq     $3,$0,$L7
        lbu     $2,-1($5)  # This is a branch delay slot
        beq     $3,$2,$L8
        addiu   $4,$4,1    # This is a branch delay slot

        subu    $2,$3,$2   # Done only once now after exiting loop.

The main problem with the new loop is that the beq comparing $2 and $3
is right before the load of $2 so there can be a delay due to the time
that the load takes.  The ideal code would probably be:

        addiu   $4,$4,1
$L8:
        lbu     $3,-1($4)
        lbu     $2,0($5)  # This is a branch delay slot
        beq     $3,$0,$L7
        addiu   $5,$5,1
        beq     $3,$2,$L8
        addiu   $4,$4,1    # This is a branch delay slot

        subu    $2,$3,$2   # Done only once now after exiting loop.

Where we load $2 earlier (using a 0 offset instead of a -1 offset) and
then do the increment of $5 after using it in the load.  The problem
is that this isn't something that can just be done in the instruction
scheduler because we are changing one of the instructions (to modify the
offset) in addition to rearranging them and I don't think the instruction
scheduler supports that.

It looks like is the ivopts code that decided to increment the registers
first and use the -1 offsets in the loads after instead of using 0 offsets
and then incrementing the offsets after the loads but I can't figure out
how or why ivopts made that decision.

Does anyone have any ideas on how I could 'fix' GCC to make it generate
the ideal code?  Is there some way to do it in the instruction scheduler?
Is there some way to modify ivopts to fix this by modifying the cost
analysis somehow?  Could I (partially) undo the fix for PR 48814?
According to the final comment in that bugzilla report the change is
really only needed for C11 and that the change does degrade the optimizer
so could we go back to the old behaviour for C89/C99?  The code in ivopts
has changed enough since the patch was applied I couldn't immediately see
how to do that in the ToT sources.

Steve Ellcey
sell...@imgtec.com

Reply via email to