On Fri, Dec 4, 2015 at 10:48 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Wed, Dec 2, 2015 at 5:11 AM, Steve Ellcey <sell...@imgtec.com> wrote:
>>
>> I have a question involving ivopts and PR 48814, which was a fix for
>> the post increment operation.  Prior to the fix for PR 48814, MIPS
>> would generate this loop for strcmp (C code from glibc):
>>
>> $L4:
>>         lbu     $3,0($4)
>>         lbu     $2,0($5)
>>         addiu   $4,$4,1
>>         beq     $3,$0,$L7
>>         addiu   $5,$5,1    # This is a branch delay slot
>>         beq     $3,$2,$L4
>>         subu    $2,$3,$2   # This is a branch delay slot (only used after 
>> loop)
>>
>>
>> With the current top-of-tree we now generate:
>>
>>         addiu   $4,$4,1
>> $L8:
>>         lbu     $3,-1($4)
>>         addiu   $5,$5,1
>>         beq     $3,$0,$L7
>>         lbu     $2,-1($5)  # This is a branch delay slot
>>         beq     $3,$2,$L8
>>         addiu   $4,$4,1    # This is a branch delay slot
>>
>>         subu    $2,$3,$2   # Done only once now after exiting loop.
>>
>> The main problem with the new loop is that the beq comparing $2 and $3
>> is right before the load of $2 so there can be a delay due to the time
>> that the load takes.  The ideal code would probably be:
>>
>>         addiu   $4,$4,1
>> $L8:
>>         lbu     $3,-1($4)
>>         lbu     $2,0($5)  # This is a branch delay slot
>>         beq     $3,$0,$L7
>>         addiu   $5,$5,1
>>         beq     $3,$2,$L8
>>         addiu   $4,$4,1    # This is a branch delay slot
>>
>>         subu    $2,$3,$2   # Done only once now after exiting loop.
>>
>> Where we load $2 earlier (using a 0 offset instead of a -1 offset) and
>> then do the increment of $5 after using it in the load.  The problem
>> is that this isn't something that can just be done in the instruction
>> scheduler because we are changing one of the instructions (to modify the
>> offset) in addition to rearranging them and I don't think the instruction
>> scheduler supports that.
> Hmm, I think Bernd introduced sched_flag !DONT_BREAK_DEPENDENCIES to
> resolve dependence by modifying address expression.  I think this is
> the same problem, what's needed is to model dependence using that
> framework.  Maybe delay slot is special here?
>
>>
>> It looks like is the ivopts code that decided to increment the registers
>> first and use the -1 offsets in the loads after instead of using 0 offsets
>> and then incrementing the offsets after the loads but I can't figure out
>> how or why ivopts made that decision.
>>
>> Does anyone have any ideas on how I could 'fix' GCC to make it generate
>> the ideal code?  Is there some way to do it in the instruction scheduler?
>> Is there some way to modify ivopts to fix this by modifying the cost
> It's likely IVO just peaks the first candidate when it runs into a
> tie.  Could you please post preprocessed source code so that I can
> have a look?  I am not familiar with glibc.  Thanks.
Oh, I saw the example in another thread of yours.

Thanks,
bin

Reply via email to