On Fri, Dec 4, 2015 at 11:00 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, Dec 4, 2015 at 10:48 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Wed, Dec 2, 2015 at 5:11 AM, Steve Ellcey <sell...@imgtec.com> wrote: >>> >>> I have a question involving ivopts and PR 48814, which was a fix for >>> the post increment operation. Prior to the fix for PR 48814, MIPS >>> would generate this loop for strcmp (C code from glibc): >>> >>> $L4: >>> lbu $3,0($4) >>> lbu $2,0($5) >>> addiu $4,$4,1 >>> beq $3,$0,$L7 >>> addiu $5,$5,1 # This is a branch delay slot >>> beq $3,$2,$L4 >>> subu $2,$3,$2 # This is a branch delay slot (only used after >>> loop) >>> >>> >>> With the current top-of-tree we now generate: >>> >>> addiu $4,$4,1 >>> $L8: >>> lbu $3,-1($4) >>> addiu $5,$5,1 >>> beq $3,$0,$L7 >>> lbu $2,-1($5) # This is a branch delay slot >>> beq $3,$2,$L8 >>> addiu $4,$4,1 # This is a branch delay slot >>> >>> subu $2,$3,$2 # Done only once now after exiting loop. >>> >>> The main problem with the new loop is that the beq comparing $2 and $3 >>> is right before the load of $2 so there can be a delay due to the time >>> that the load takes. The ideal code would probably be: >>> >>> addiu $4,$4,1 >>> $L8: >>> lbu $3,-1($4) >>> lbu $2,0($5) # This is a branch delay slot >>> beq $3,$0,$L7 >>> addiu $5,$5,1 >>> beq $3,$2,$L8 >>> addiu $4,$4,1 # This is a branch delay slot >>> >>> subu $2,$3,$2 # Done only once now after exiting loop. >>> >>> Where we load $2 earlier (using a 0 offset instead of a -1 offset) and >>> then do the increment of $5 after using it in the load. The problem >>> is that this isn't something that can just be done in the instruction >>> scheduler because we are changing one of the instructions (to modify the >>> offset) in addition to rearranging them and I don't think the instruction >>> scheduler supports that. >> Hmm, I think Bernd introduced sched_flag !DONT_BREAK_DEPENDENCIES to >> resolve dependence by modifying address expression. I think this is >> the same problem, what's needed is to model dependence using that >> framework. Maybe delay slot is special here? >> >>> >>> It looks like is the ivopts code that decided to increment the registers >>> first and use the -1 offsets in the loads after instead of using 0 offsets >>> and then incrementing the offsets after the loads but I can't figure out >>> how or why ivopts made that decision. >>> >>> Does anyone have any ideas on how I could 'fix' GCC to make it generate >>> the ideal code? Is there some way to do it in the instruction scheduler? >>> Is there some way to modify ivopts to fix this by modifying the cost >> It's likely IVO just peaks the first candidate when it runs into a >> tie. Could you please post preprocessed source code so that I can >> have a look? I am not familiar with glibc. Thanks. > Oh, I saw the example in another thread of yours. > Dump before IVO is as below:
<bb 3>: # s1_1 = PHI <p1_4(D)(2), s1_6(6)> # s2_2 = PHI <p2_5(D)(2), s2_9(6)> s1_6 = s1_1 + 1; c1_8 = *s1_1; s2_9 = s2_2 + 1; c2_10 = *s2_2; if (c1_8 == 0) goto <bb 4>; else goto <bb 5>; And the iv candidates are as: candidate 1 (important) var_before ivtmp.6 var_after ivtmp.6 incremented before exit test type unsigned int base (unsigned int) p1_4(D) step 1 base object (void *) p1_4(D) candidate 2 (important) original biv type const unsigned char * base (const unsigned char *) p1_4(D) step 1 base object (void *) p1_4(D) candidate 3 (important) var_before ivtmp.7 var_after ivtmp.7 incremented before exit test type unsigned int base (unsigned int) p2_5(D) step 1 base object (void *) p2_5(D) candidate 4 (important) original biv type const unsigned char * base (const unsigned char *) p2_5(D) step 1 base object (void *) p2_5(D) Generally GCC would choose normal candidates {1, 3} and insert increment before exit condition. This is expected in this case. But when there is applicable original candidates {2, 4}, GCC would prefer these in order to achieve better debugging. Also as I suspected, [reg] and [reg-1] have same address cost on mips, that's why GCC makes current decision. Thanks, bin