Manuel Lauss <[EMAIL PROTECTED]> writes:
> On Sun, Nov 23, 2008 at 05:14:27PM +0000, Richard Sandiford wrote:
>> Richard Sandiford <[EMAIL PROTECTED]> writes:
>> > Manuel Lauss <[EMAIL PROTECTED]> writes:
>> >> Admittedly my understanding of mips assembly is not yet very advanced, am 
>> >> I
>> >> missing something or is this a bug?
>> >
>> > Well, it's a missed optimisation, certainly.  Fortunately,
>> > it's conceptually fairly easy to fix.
>> 
>> FWIW, I've attached a patch against GCC 4.4 below.  Tested on
>> mipsisa64-elfoabi.
>> 
>> I don't think it's appropriate to apply the patch at this stage
>> in the development cycle, so I'll hold it back for GCC 4.5.
>> I'll add some testcases then too.
>
> I tested this with 4.3.2 and it works very well.

Great!  Thanks for testing it.

> One minor thing though if you don't mind.  Consider this C snippet:
>
> -------------- 8< ------------------------- >8 ----------------
>
> #define SYS_CNTRL_M21   (1 << 19)
> #define SYS_COUNTER_CNTRL 0xb1900014
> #define SYS_RTCMATCH2   0xb1900054
> #define SYS_RTCREAD     0xb1900058
>
> int au1x_rtcmatch2_set_next_event(unsigned long delta, void *x)
> {
>   unsigned long next;
>
>   next = *(unsigned long *)(SYS_RTCREAD) + delta;
>
>   /* wait for register access */
>   while (*(unsigned long *)(SYS_COUNTER_CNTRL) & SYS_CNTRL_M21)
>     ;
>   *(unsigned long *)(SYS_RTCMATCH2) = next;
>   asm volatile ("sync");
> }
>
> -------------- 8< ------------------------- >8 -------------
>
> My patched gcc generates this with -O2 (-Os is actually worse):
>
> 0x00000028 <au1x_rtcmatch2_set_next_event+0>:   lui     v0,0xb190
> 0x0000002c <au1x_rtcmatch2_set_next_event+4>:   lw      v1,88(v0)
> 0x00000030 <au1x_rtcmatch2_set_next_event+8>:   lui     a2,0xb190
> 0x00000034 <au1x_rtcmatch2_set_next_event+12>:  lui     a1,0x8
> 0x00000038 <au1x_rtcmatch2_set_next_event+16>:  lw      v0,20(a2)
> 0x0000003c <au1x_rtcmatch2_set_next_event+20>:  and     v0,v0,a1
> 0x00000040 <au1x_rtcmatch2_set_next_event+24>:  bnez    v0,0x38 
> <au1x_rtcmatch2_set_next_event+16>
> 0x00000044 <au1x_rtcmatch2_set_next_event+28>:  lui     v0,0xb190
> 0x00000048 <au1x_rtcmatch2_set_next_event+32>:  addu    v1,a0,v1
> 0x0000004c <au1x_rtcmatch2_set_next_event+36>:  sw      v1,84(v0)
> 0x00000050 <au1x_rtcmatch2_set_next_event+40>:  sync
> 0x00000054 <au1x_rtcmatch2_set_next_event+44>:  jr      ra
> 0x00000058 <au1x_rtcmatch2_set_next_event+48>:  nop
>
> IMO gcc's register usage is a bit odd;  when the 'addu' at 0x48
> is moved to 0x30 (as 'addu a0,a0,v1'), v1 could be reused and the
> two additional 'lui v0,0xb190' become unecessary, saving 2 insns.
> (This code is called fairly often, so I do care about performance ;-))

Hmm, GCC 4.4 suffers from the same problem.  Before CSE1 we have:

     R1 := 0xb1900000
     load R1+88
     ...
loop:
     ...
     R2 := 0xb1900000
     load R2+20
     ...
     R3 := 0xb1900000
     store R3+84

After CSE we have:

     R1 := 0xb1900000
     load R1+88
     ...
loop:
     ...
     R2 := 0xb1900000
     load R2+20
     ...
     store R2+84           <--- A

where the important thing is that A uses R2 instead of R1.
Loop hoisting converts this to:

     R1 := 0xb1900000
     load R1+88
     ...
     R4 := 0xb1900000
loop:
     ...
     R2 := R4
     load R4+20
     ...
     store R2+84

where the loop still has an assigment to R2, even though it is
now only used after the loop.  Then the GCSE "bypass" pass comes
along and replaces this with:

     R1 := 0xb1900000
     load R1+88
     ...
     R4 := 0xb1900000
loop:
     ...
     R2 := 0xb1900000
     load R4+20
     ...
     store R2+84           <--- A

As a workaround, -fno-gcse produces good code for this testcase with 4.4.
Maybe it will for 4.3 too.  It might well make other functions worse though.

Richard

Reply via email to