On Fri, Dec 4, 2015 at 10:48 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Dec 2, 2015 at 5:11 AM, Steve Ellcey <sell...@imgtec.com> wrote: >> >> I have a question involving ivopts and PR 48814, which was a fix for >> the post increment operation. Prior to the fix for PR 48814, MIPS >> would generate this loop for strcmp (C code from glibc): >> >> $L4: >> lbu $3,0($4) >> lbu $2,0($5) >> addiu $4,$4,1 >> beq $3,$0,$L7 >> addiu $5,$5,1 # This is a branch delay slot >> beq $3,$2,$L4 >> subu $2,$3,$2 # This is a branch delay slot (only used after >> loop) >> >> >> With the current top-of-tree we now generate: >> >> addiu $4,$4,1 >> $L8: >> lbu $3,-1($4) >> addiu $5,$5,1 >> beq $3,$0,$L7 >> lbu $2,-1($5) # This is a branch delay slot >> beq $3,$2,$L8 >> addiu $4,$4,1 # This is a branch delay slot >> >> subu $2,$3,$2 # Done only once now after exiting loop. >> >> The main problem with the new loop is that the beq comparing $2 and $3 >> is right before the load of $2 so there can be a delay due to the time >> that the load takes. The ideal code would probably be: >> >> addiu $4,$4,1 >> $L8: >> lbu $3,-1($4) >> lbu $2,0($5) # This is a branch delay slot >> beq $3,$0,$L7 >> addiu $5,$5,1 >> beq $3,$2,$L8 >> addiu $4,$4,1 # This is a branch delay slot >> >> subu $2,$3,$2 # Done only once now after exiting loop. >> >> Where we load $2 earlier (using a 0 offset instead of a -1 offset) and >> then do the increment of $5 after using it in the load. The problem >> is that this isn't something that can just be done in the instruction >> scheduler because we are changing one of the instructions (to modify the >> offset) in addition to rearranging them and I don't think the instruction >> scheduler supports that. > Hmm, I think Bernd introduced sched_flag !DONT_BREAK_DEPENDENCIES to > resolve dependence by modifying address expression. I think this is > the same problem, what's needed is to model dependence using that > framework. Maybe delay slot is special here? > >> >> It looks like is the ivopts code that decided to increment the registers >> first and use the -1 offsets in the loads after instead of using 0 offsets >> and then incrementing the offsets after the loads but I can't figure out >> how or why ivopts made that decision. >> >> Does anyone have any ideas on how I could 'fix' GCC to make it generate >> the ideal code? Is there some way to do it in the instruction scheduler? >> Is there some way to modify ivopts to fix this by modifying the cost > It's likely IVO just peaks the first candidate when it runs into a > tie. Could you please post preprocessed source code so that I can > have a look? I am not familiar with glibc. Thanks. Oh, I saw the example in another thread of yours.
Thanks, bin