On Fri, Dec 4, 2015 at 11:00 AM, Bin.Cheng <[email protected]> wrote:
> On Fri, Dec 4, 2015 at 10:48 AM, Bin.Cheng <[email protected]> wrote:
>> On Wed, Dec 2, 2015 at 5:11 AM, Steve Ellcey <[email protected]> wrote:
>>>
>>> I have a question involving ivopts and PR 48814, which was a fix for
>>> the post increment operation. Prior to the fix for PR 48814, MIPS
>>> would generate this loop for strcmp (C code from glibc):
>>>
>>> $L4:
>>> lbu $3,0($4)
>>> lbu $2,0($5)
>>> addiu $4,$4,1
>>> beq $3,$0,$L7
>>> addiu $5,$5,1 # This is a branch delay slot
>>> beq $3,$2,$L4
>>> subu $2,$3,$2 # This is a branch delay slot (only used after
>>> loop)
>>>
>>>
>>> With the current top-of-tree we now generate:
>>>
>>> addiu $4,$4,1
>>> $L8:
>>> lbu $3,-1($4)
>>> addiu $5,$5,1
>>> beq $3,$0,$L7
>>> lbu $2,-1($5) # This is a branch delay slot
>>> beq $3,$2,$L8
>>> addiu $4,$4,1 # This is a branch delay slot
>>>
>>> subu $2,$3,$2 # Done only once now after exiting loop.
>>>
>>> The main problem with the new loop is that the beq comparing $2 and $3
>>> is right before the load of $2 so there can be a delay due to the time
>>> that the load takes. The ideal code would probably be:
>>>
>>> addiu $4,$4,1
>>> $L8:
>>> lbu $3,-1($4)
>>> lbu $2,0($5) # This is a branch delay slot
>>> beq $3,$0,$L7
>>> addiu $5,$5,1
>>> beq $3,$2,$L8
>>> addiu $4,$4,1 # This is a branch delay slot
>>>
>>> subu $2,$3,$2 # Done only once now after exiting loop.
>>>
>>> Where we load $2 earlier (using a 0 offset instead of a -1 offset) and
>>> then do the increment of $5 after using it in the load. The problem
>>> is that this isn't something that can just be done in the instruction
>>> scheduler because we are changing one of the instructions (to modify the
>>> offset) in addition to rearranging them and I don't think the instruction
>>> scheduler supports that.
>> Hmm, I think Bernd introduced sched_flag !DONT_BREAK_DEPENDENCIES to
>> resolve dependence by modifying address expression. I think this is
>> the same problem, what's needed is to model dependence using that
>> framework. Maybe delay slot is special here?
>>
>>>
>>> It looks like is the ivopts code that decided to increment the registers
>>> first and use the -1 offsets in the loads after instead of using 0 offsets
>>> and then incrementing the offsets after the loads but I can't figure out
>>> how or why ivopts made that decision.
>>>
>>> Does anyone have any ideas on how I could 'fix' GCC to make it generate
>>> the ideal code? Is there some way to do it in the instruction scheduler?
>>> Is there some way to modify ivopts to fix this by modifying the cost
>> It's likely IVO just peaks the first candidate when it runs into a
>> tie. Could you please post preprocessed source code so that I can
>> have a look? I am not familiar with glibc. Thanks.
> Oh, I saw the example in another thread of yours.
>
Dump before IVO is as below:
<bb 3>:
# s1_1 = PHI <p1_4(D)(2), s1_6(6)>
# s2_2 = PHI <p2_5(D)(2), s2_9(6)>
s1_6 = s1_1 + 1;
c1_8 = *s1_1;
s2_9 = s2_2 + 1;
c2_10 = *s2_2;
if (c1_8 == 0)
goto <bb 4>;
else
goto <bb 5>;
And the iv candidates are as:
candidate 1 (important)
var_before ivtmp.6
var_after ivtmp.6
incremented before exit test
type unsigned int
base (unsigned int) p1_4(D)
step 1
base object (void *) p1_4(D)
candidate 2 (important)
original biv
type const unsigned char *
base (const unsigned char *) p1_4(D)
step 1
base object (void *) p1_4(D)
candidate 3 (important)
var_before ivtmp.7
var_after ivtmp.7
incremented before exit test
type unsigned int
base (unsigned int) p2_5(D)
step 1
base object (void *) p2_5(D)
candidate 4 (important)
original biv
type const unsigned char *
base (const unsigned char *) p2_5(D)
step 1
base object (void *) p2_5(D)
Generally GCC would choose normal candidates {1, 3} and insert
increment before exit condition. This is expected in this case. But
when there is applicable original candidates {2, 4}, GCC would prefer
these in order to achieve better debugging. Also as I suspected,
[reg] and [reg-1] have same address cost on mips, that's why GCC makes
current decision.
Thanks,
bin