https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288

            Bug ID: 67288
           Summary: [4.9 regression] non optimal simple function (useless
                    additional shift/remove/shift/add)
           Product: gcc
           Version: 4.9.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: regression
          Assignee: unassigned at gcc dot gnu.org
          Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

The following function (Linux Kernel, compiled with -O2) was resulting in a
good assembly with GCC 4.8.3. With GCC 4.9.3 there are a lot of unneccessary
instructions

/* L1_CACHE_BYTES = 16 */
/* L1_CACHE_SHIFT = 4 */

#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

static inline void dcbf(void *addr)
{
        __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
}

void flush_dcache_range(unsigned long start, unsigned long stop)
{
        void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
        unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
        unsigned int i;

        for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
                dcbf(addr);
        if (i)
                mb();
}

Result with GCC 4.9.3: (15 insns)

c000d970 <flush_dcache_range>:
c000d970:       54 63 00 36     rlwinm  r3,r3,0,0,27
c000d974:       38 84 00 0f     addi    r4,r4,15
c000d978:       7c 83 20 50     subf    r4,r3,r4
c000d97c:       54 89 e1 3f     rlwinm. r9,r4,28,4,31
c000d980:       4d 82 00 20     beqlr   
c000d984:       55 24 20 36     rlwinm  r4,r9,4,0,27
c000d988:       39 24 ff f0     addi    r9,r4,-16
c000d98c:       55 29 e1 3e     rlwinm  r9,r9,28,4,31
c000d990:       39 29 00 01     addi    r9,r9,1
c000d994:       7d 29 03 a6     mtctr   r9
c000d998:       7c 00 18 ac     dcbf    0,r3
c000d99c:       38 63 00 10     addi    r3,r3,16
c000d9a0:       42 00 ff f8     bdnz    c000d998 <flush_dcache_range+0x28>
c000d9a4:       7c 00 04 ac     sync    
c000d9a8:       4e 80 00 20     blr

The following section is just useless: (shift left 4 bits, remove 16, shift
right 4 bits, add 1)
c000d984:       55 24 20 36     rlwinm  r4,r9,4,0,27
c000d988:       39 24 ff f0     addi    r9,r4,-16
c000d98c:       55 29 e1 3e     rlwinm  r9,r9,28,4,31
c000d990:       39 29 00 01     addi    r9,r9,1



Result with GCC 4.8.3 was correct: (11 insns)

c000d894 <flush_dcache_range>:
c000d894:       54 63 00 36     rlwinm  r3,r3,0,0,27
c000d898:       38 84 00 0f     addi    r4,r4,15
c000d89c:       7d 23 20 50     subf    r9,r3,r4
c000d8a0:       55 29 e1 3f     rlwinm. r9,r9,28,4,31
c000d8a4:       4d 82 00 20     beqlr   
c000d8a8:       7d 29 03 a6     mtctr   r9
c000d8ac:       7c 00 18 ac     dcbf    0,r3
c000d8b0:       38 63 00 10     addi    r3,r3,16
c000d8b4:       42 00 ff f8     bdnz    c000d8ac <flush_dcache_range+0x18>
c000d8b8:       7c 00 04 ac     sync    
c000d8bc:       4e 80 00 20     blr

Reply via email to