https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86680

--- Comment #8 from Florian La Roche <florian.laroche at googlemail dot com> ---
I've found something the compiler optimized quite nicely:
(Good for the compiler, but I'd be happy to stay with the original code
that was much easier to read for humans.)



extern unsigned long __bss_start[];
extern unsigned long __bss_end[];
//extern unsigned long __bss_size;

void clear_bss(void)
{
    unsigned long *bss = __bss_start;
    unsigned long i, end = __bss_end - __bss_start;
    //unsigned long i = __bss_size;
    for (i = 0; i < end; i += sizeof (unsigned long))
        *bss++ = 0UL;
}




This results on aarch64 into this code:
0000000000000000 <clear_bss>:
   0:   90000001        adrp    x1, 0 <__bss_end>
   4:   90000002        adrp    x2, 0 <__bss_start>
   8:   f9400021        ldr     x1, [x1]
   c:   f9400042        ldr     x2, [x2]
  10:   cb020021        sub     x1, x1, x2
  14:   9343fc21        asr     x1, x1, #3
  18:   b40000c1        cbz     x1, 30 <clear_bss+0x30>
  1c:   d2800000        mov     x0, #0x0                        // #0
  20:   f822681f        str     xzr, [x0, x2]
  24:   91002000        add     x0, x0, #0x8
  28:   eb00003f        cmp     x1, x0
  2c:   54ffffa8        b.hi    20 <clear_bss+0x20>  // b.pmore
  30:   d65f03c0        ret


Jakub, your example code did also result in pretty large code
(but I've only tested 8.0.1, not the newest release on this).


Thanks a lot,
best regards,

Florian La Roche

Reply via email to