https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103579
Bug ID: 103579 Summary: [gcc-10.3.1, gcc-12.0.0] -fno-builtin-memset and -fno-builtin have different effect on optimization Product: gcc Version: og10 (devel/omp/gcc-10) Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: dimitri.gorokhovik at free dot fr Target Milestone: --- Consider the following code: void *memset(void *s, int c, unsigned long nb) { for (char *ss = s; nb > 0; nb--) *ss++ = c; return s; } Compiling it using the toolchain: aarch64-none-elf-gcc (GNU Toolchain for the A-profile Architecture 10.3-2021.07 (arm-10.29)) 10.3.1 20210621 (This is a toolchain downloadable on Arm Ltd's site.) as follows: -O2 -fno-builtin-memset -S test-1.c -o test-1.S Produces: .arch armv8.2-a+crc+fp16+rcpc+dotprod .file "test-1.c" .text .align 2 .p2align 4,,15 .global memset .type memset, %function memset: stp x29, x30, [sp, -32]! mov x29, sp str x19, [sp, 16] mov x19, x0 cbz x2, .L4 and w1, w1, 255 bl memset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .L4: mov x0, x19 ldr x19, [sp, 16] ldp x29, x30, [sp], 32 ret .size memset, .-memset .ident "GCC: (GNU Toolchain for the A-profile Architecture 10.3-2021.07 (arm-10.29)) 10.3.1 20210621" i.e., the optimization replaced the 'for' loop by a call to the builtin memset (which in this specific case leads to endless recursion). However, same code compiled with: -O2 -fno-builtin -S test-1.c -o test-1.S produces: .arch armv8.2-a+crc+fp16+rcpc+dotprod .file "test-1.c" .text .align 2 .p2align 4,,15 .global memset .type memset, %function memset: add x4, x0, x2 and w1, w1, 255 mov x3, x0 cbz x2, .L7 .p2align 3,,7 .L3: strb w1, [x3], 1 cmp x3, x4 bne .L3 .L7: ret .size memset, .-memset .ident "GCC: (GNU Toolchain for the A-profile Architecture 10.3-2021.07 (arm-10.29)) 10.3.1 20210621" i.e. in this case the loop has not been replaced. In addition: 1. '-fno-builtin-memset' has no impact on this sample code (call to memset is always generated). Omitting '-fno-builtin' has an observable effect. 2. Same result can be obtained using the gcc (GCC) 12.0.0 20210812 (experimental) for x86_64. Gripes: -- GCC Manual doesn't specify, should the use of '-fno-builtin' or of '-fno-builtin-{name}' have impact on the optimization passes, it only speaks about prototypes. It would be great to have all the effects stated clearly. (It seems logical that, if we don't want builtins, then the opt. passes should not use them.) My case is a bare-metal toolchain, memset implementation might not be provided at all. -- certainly such difference in impact between -fno-builtin and -fno-builtin-memset is unexpected and leads to loss of time. Can one have the same behavior between '-fno-builtin' and '-fno-builtin-{memset,memcpy}' and all other builtins that optimization passes substitute into generated code (today and in the future)?