Manuel Lauss <[EMAIL PROTECTED]> writes: > On Sun, Nov 23, 2008 at 05:14:27PM +0000, Richard Sandiford wrote: >> Richard Sandiford <[EMAIL PROTECTED]> writes: >> > Manuel Lauss <[EMAIL PROTECTED]> writes: >> >> Admittedly my understanding of mips assembly is not yet very advanced, am >> >> I >> >> missing something or is this a bug? >> > >> > Well, it's a missed optimisation, certainly. Fortunately, >> > it's conceptually fairly easy to fix. >> >> FWIW, I've attached a patch against GCC 4.4 below. Tested on >> mipsisa64-elfoabi. >> >> I don't think it's appropriate to apply the patch at this stage >> in the development cycle, so I'll hold it back for GCC 4.5. >> I'll add some testcases then too. > > I tested this with 4.3.2 and it works very well.
Great! Thanks for testing it. > One minor thing though if you don't mind. Consider this C snippet: > > -------------- 8< ------------------------- >8 ---------------- > > #define SYS_CNTRL_M21 (1 << 19) > #define SYS_COUNTER_CNTRL 0xb1900014 > #define SYS_RTCMATCH2 0xb1900054 > #define SYS_RTCREAD 0xb1900058 > > int au1x_rtcmatch2_set_next_event(unsigned long delta, void *x) > { > unsigned long next; > > next = *(unsigned long *)(SYS_RTCREAD) + delta; > > /* wait for register access */ > while (*(unsigned long *)(SYS_COUNTER_CNTRL) & SYS_CNTRL_M21) > ; > *(unsigned long *)(SYS_RTCMATCH2) = next; > asm volatile ("sync"); > } > > -------------- 8< ------------------------- >8 ------------- > > My patched gcc generates this with -O2 (-Os is actually worse): > > 0x00000028 <au1x_rtcmatch2_set_next_event+0>: lui v0,0xb190 > 0x0000002c <au1x_rtcmatch2_set_next_event+4>: lw v1,88(v0) > 0x00000030 <au1x_rtcmatch2_set_next_event+8>: lui a2,0xb190 > 0x00000034 <au1x_rtcmatch2_set_next_event+12>: lui a1,0x8 > 0x00000038 <au1x_rtcmatch2_set_next_event+16>: lw v0,20(a2) > 0x0000003c <au1x_rtcmatch2_set_next_event+20>: and v0,v0,a1 > 0x00000040 <au1x_rtcmatch2_set_next_event+24>: bnez v0,0x38 > <au1x_rtcmatch2_set_next_event+16> > 0x00000044 <au1x_rtcmatch2_set_next_event+28>: lui v0,0xb190 > 0x00000048 <au1x_rtcmatch2_set_next_event+32>: addu v1,a0,v1 > 0x0000004c <au1x_rtcmatch2_set_next_event+36>: sw v1,84(v0) > 0x00000050 <au1x_rtcmatch2_set_next_event+40>: sync > 0x00000054 <au1x_rtcmatch2_set_next_event+44>: jr ra > 0x00000058 <au1x_rtcmatch2_set_next_event+48>: nop > > IMO gcc's register usage is a bit odd; when the 'addu' at 0x48 > is moved to 0x30 (as 'addu a0,a0,v1'), v1 could be reused and the > two additional 'lui v0,0xb190' become unecessary, saving 2 insns. > (This code is called fairly often, so I do care about performance ;-)) Hmm, GCC 4.4 suffers from the same problem. Before CSE1 we have: R1 := 0xb1900000 load R1+88 ... loop: ... R2 := 0xb1900000 load R2+20 ... R3 := 0xb1900000 store R3+84 After CSE we have: R1 := 0xb1900000 load R1+88 ... loop: ... R2 := 0xb1900000 load R2+20 ... store R2+84 <--- A where the important thing is that A uses R2 instead of R1. Loop hoisting converts this to: R1 := 0xb1900000 load R1+88 ... R4 := 0xb1900000 loop: ... R2 := R4 load R4+20 ... store R2+84 where the loop still has an assigment to R2, even though it is now only used after the loop. Then the GCSE "bypass" pass comes along and replaces this with: R1 := 0xb1900000 load R1+88 ... R4 := 0xb1900000 loop: ... R2 := 0xb1900000 load R4+20 ... store R2+84 <--- A As a workaround, -fno-gcse produces good code for this testcase with 4.4. Maybe it will for 4.3 too. It might well make other functions worse though. Richard