[Bug c/103579] New: [gcc-10.3.1, gcc-12.0.0] -fno-builtin-memset and -fno-builtin have different effect on optimization

dimitri.gorokhovik at free dot fr via Gcc-bugs Mon, 06 Dec 2021 02:25:14 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103579


            Bug ID: 103579
           Summary: [gcc-10.3.1, gcc-12.0.0] -fno-builtin-memset and
                    -fno-builtin have different effect on optimization
           Product: gcc
           Version: og10 (devel/omp/gcc-10)
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dimitri.gorokhovik at free dot fr
  Target Milestone: ---

Consider the following code:

void *memset(void *s, int c, unsigned long nb) 
{  
  for (char *ss = s; nb > 0; nb--)
        *ss++ = c;

    return s;
}

Compiling it using the toolchain:
aarch64-none-elf-gcc (GNU Toolchain for the A-profile Architecture 10.3-2021.07
(arm-10.29)) 10.3.1 20210621 

(This is a toolchain downloadable on Arm Ltd's site.)

as follows:
-O2 -fno-builtin-memset  -S test-1.c  -o test-1.S

Produces:
        .arch armv8.2-a+crc+fp16+rcpc+dotprod
        .file   "test-1.c"
        .text
        .align  2
        .p2align 4,,15
        .global memset
        .type   memset, %function
memset:
        stp     x29, x30, [sp, -32]!
        mov     x29, sp
        str     x19, [sp, 16]
        mov     x19, x0
        cbz     x2, .L4
        and     w1, w1, 255
        bl      memset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.L4:
        mov     x0, x19
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 32
        ret
        .size   memset, .-memset
        .ident  "GCC: (GNU Toolchain for the A-profile Architecture
10.3-2021.07 (arm-10.29)) 10.3.1 20210621"


i.e., the optimization replaced the 'for' loop by a call to the builtin memset
(which in this specific case leads to endless recursion).



However, same code compiled with:
-O2 -fno-builtin  -S test-1.c  -o test-1.S

produces:
        .arch armv8.2-a+crc+fp16+rcpc+dotprod
        .file   "test-1.c"
        .text
        .align  2
        .p2align 4,,15
        .global memset
        .type   memset, %function
memset:
        add     x4, x0, x2
        and     w1, w1, 255
        mov     x3, x0
        cbz     x2, .L7
        .p2align 3,,7
.L3:
        strb    w1, [x3], 1
        cmp     x3, x4
        bne     .L3
.L7:
        ret
        .size   memset, .-memset
        .ident  "GCC: (GNU Toolchain for the A-profile Architecture
10.3-2021.07 (arm-10.29)) 10.3.1 20210621"


i.e. in this case the loop has not been replaced.

In addition:
1. '-fno-builtin-memset' has no impact on this sample code (call to memset is
always generated). Omitting '-fno-builtin' has an observable effect.

2. Same result can be obtained using the gcc (GCC) 12.0.0 20210812
(experimental) for x86_64.


Gripes:
-- GCC Manual doesn't specify, should the use of '-fno-builtin' or of
'-fno-builtin-{name}' have impact on the optimization passes, it only speaks
about prototypes. It would be great to have all the effects stated clearly. (It
seems logical that, if we don't want builtins, then the opt. passes should not
use them.) My case is a bare-metal toolchain, memset implementation might not
be provided at all.

-- certainly such difference in impact between -fno-builtin and
-fno-builtin-memset is unexpected and leads to loss of time. Can one have the
same behavior between '-fno-builtin' and '-fno-builtin-{memset,memcpy}' and all
other builtins that optimization passes substitute into generated code (today
and in the future)?

[Bug c/103579] New: [gcc-10.3.1, gcc-12.0.0] -fno-builtin-memset and -fno-builtin have different effect on optimization

Reply via email to