I have a question involving ivopts and PR 48814, which was a fix for the post increment operation. Prior to the fix for PR 48814, MIPS would generate this loop for strcmp (C code from glibc):
$L4: lbu $3,0($4) lbu $2,0($5) addiu $4,$4,1 beq $3,$0,$L7 addiu $5,$5,1 # This is a branch delay slot beq $3,$2,$L4 subu $2,$3,$2 # This is a branch delay slot (only used after loop) With the current top-of-tree we now generate: addiu $4,$4,1 $L8: lbu $3,-1($4) addiu $5,$5,1 beq $3,$0,$L7 lbu $2,-1($5) # This is a branch delay slot beq $3,$2,$L8 addiu $4,$4,1 # This is a branch delay slot subu $2,$3,$2 # Done only once now after exiting loop. The main problem with the new loop is that the beq comparing $2 and $3 is right before the load of $2 so there can be a delay due to the time that the load takes. The ideal code would probably be: addiu $4,$4,1 $L8: lbu $3,-1($4) lbu $2,0($5) # This is a branch delay slot beq $3,$0,$L7 addiu $5,$5,1 beq $3,$2,$L8 addiu $4,$4,1 # This is a branch delay slot subu $2,$3,$2 # Done only once now after exiting loop. Where we load $2 earlier (using a 0 offset instead of a -1 offset) and then do the increment of $5 after using it in the load. The problem is that this isn't something that can just be done in the instruction scheduler because we are changing one of the instructions (to modify the offset) in addition to rearranging them and I don't think the instruction scheduler supports that. It looks like is the ivopts code that decided to increment the registers first and use the -1 offsets in the loads after instead of using 0 offsets and then incrementing the offsets after the loads but I can't figure out how or why ivopts made that decision. Does anyone have any ideas on how I could 'fix' GCC to make it generate the ideal code? Is there some way to do it in the instruction scheduler? Is there some way to modify ivopts to fix this by modifying the cost analysis somehow? Could I (partially) undo the fix for PR 48814? According to the final comment in that bugzilla report the change is really only needed for C11 and that the change does degrade the optimizer so could we go back to the old behaviour for C89/C99? The code in ivopts has changed enough since the patch was applied I couldn't immediately see how to do that in the ToT sources. Steve Ellcey sell...@imgtec.com