https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
Branko Drevensek <branko.drevensek at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |branko.drevensek at gmail dot com Mark Rutland <mark at kernel dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mark at kernel dot org --- Comment #8 from Branko Drevensek <branko.drevensek at gmail dot com> --- Size optimization turning function alignment off assumes function alignment is an optimization only, while for some architectures it might be requirement for certain functions, such as interrupt handlers on risc-v. This makes it impossible to have those functions aligned using this switch/attribute regardless of optimization level selected as -Os will cause alignment setting to be ignored. --- Comment #9 from Mark Rutland <mark at kernel dot org> --- This appears to be one case of several where GCC drops the alignment specified by `-falign-functions=N`. I'm commenting with the other cases here rather than creating new tickets on the assumption that's preferable. Dropping the alignment specified by `-falign-functions=N` is a functional issue for the arm64 Linux kernel port affecting our 'ftrace' tracing mechanism. I see this with GCC 12.1.0 (and have no tested other versions), and LLVM seems to always respect the alignment specified by `-falign-functions=N` The arm64 Linux kernel port needs to use `-falign-functions=8` along with `-fpatchable-function-entry=N,2` to place a naturally-aligned 8-byte literal at the start of functions. There's some detail of that at: https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutl...@arm.com/ As noted earlier in this ticket, GCC does not seem to respect `-falign-functions=N` when using `-Os`. For my use-case we cvan work around the issue by not passing `-Os`, and I have one patch to do so, but this is not ideal: https://lore.kernel.org/lkml/20230109135828.879136-3-mark.rutl...@arm.com/ In addition, GCC seems to drop alignment for cold functions, whether those are marked as cold explicitly or when determined by some interprocedural analysis. I've noted this on LKML at: https://lore.kernel.org/lkml/Y77%2FqVgvaJidFpYt@FVFF77S0Q05N/ ... the below summary is a copy-paste of that: For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-cold.c | #define __cold \ | __attribute__((cold)) | | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func = (func) | | __cold | void cold_func_a(void) { } | | __cold | void cold_func_b(void) { } | | __cold | void cold_func_c(void) { } | | static __cold | void static_cold_func_a(void) { } | EXPORT_FUNC_PTR(static_cold_func_a); | | static __cold | void static_cold_func_b(void) { } | EXPORT_FUNC_PTR(static_cold_func_b); | | static __cold | void static_cold_func_c(void) { } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-cold.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-cold.o | | test-cold.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | | 0000000000000000 <static_cold_func_a>: | 0: d65f03c0 ret | | 0000000000000004 <static_cold_func_b>: | 4: d65f03c0 ret | | 0000000000000008 <static_cold_func_c>: | 8: d65f03c0 ret | | 000000000000000c <cold_func_a>: | c: d65f03c0 ret | | 0000000000000010 <cold_func_b>: | 10: d65f03c0 ret | | 0000000000000014 <cold_func_c>: | 14: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-cold.o | | test-cold.o: file format elf64-littleaarch64 | | Sections: | Idx Name Size VMA LMA File off Algn | 0 .text 00000018 0000000000000000 0000000000000000 00000040 2**2 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000058 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 00000070 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 00000070 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000083 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000090 0000000000000000 0000000000000000 00000088 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA In simple cases, alignment *can* be restored if an explicit function attribute is used. For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold.c | #define __aligned(n) \ | __attribute__((aligned(n))) | | #define __cold \ | __attribute__((cold)) __aligned(16) | | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func = (func) | | __cold | void cold_func_a(void) { } | | __cold | void cold_func_b(void) { } | | __cold | void cold_func_c(void) { } | | static __cold | void static_cold_func_a(void) { } | EXPORT_FUNC_PTR(static_cold_func_a); | | static __cold | void static_cold_func_b(void) { } | EXPORT_FUNC_PTR(static_cold_func_b); | | static __cold | void static_cold_func_c(void) { } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold.o | | test-aligned-cold.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | | 0000000000000000 <static_cold_func_a>: | 0: d65f03c0 ret | 4: d503201f nop | 8: d503201f nop | c: d503201f nop | | 0000000000000010 <static_cold_func_b>: | 10: d65f03c0 ret | 14: d503201f nop | 18: d503201f nop | 1c: d503201f nop | | 0000000000000020 <static_cold_func_c>: | 20: d65f03c0 ret | 24: d503201f nop | 28: d503201f nop | 2c: d503201f nop | | 0000000000000030 <cold_func_a>: | 30: d65f03c0 ret | 34: d503201f nop | 38: d503201f nop | 3c: d503201f nop | | 0000000000000040 <cold_func_b>: | 40: d65f03c0 ret | 44: d503201f nop | 48: d503201f nop | 4c: d503201f nop | | 0000000000000050 <cold_func_c>: | 50: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold.o | | test-aligned-cold.o: file format elf64-littleaarch64 | | Sections: | Idx Name Size VMA LMA File off Algn | 0 .text 00000054 0000000000000000 0000000000000000 00000040 2**4 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000098 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 000000b0 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 000000b0 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000000c3 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000090 0000000000000000 0000000000000000 000000c8 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA Unfortunately it appears that some interprocedural analysis determines that if a callee is only called/referenced from cold callers, the callee is marked as cold, and the alignment it would have got from the command line option is dropped. If it's given an explicit alignment attribute, the alignment is retained. For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold-caller.c | #define noinline \ | __attribute__((noinline)) | | #define __aligned(n) \ | __attribute__((aligned(n))) | | #define __cold \ | __attribute__((cold)) __aligned(16) | | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func = (func) | | static noinline void callee_a(void) | { | asm volatile("// callee_a\n" ::: "memory"); | } | | static noinline void callee_b(void) | { | asm volatile("// callee_b\n" ::: "memory"); | } | | static noinline void callee_c(void) | { | asm volatile("// callee_c\n" ::: "memory"); | } | __cold | void cold_func_a(void) { callee_a(); } | | __cold | void cold_func_b(void) { callee_b(); } | | __cold | void cold_func_c(void) { callee_c(); } | | static __cold | void static_cold_func_a(void) { callee_a(); } | EXPORT_FUNC_PTR(static_cold_func_a); | | static __cold | void static_cold_func_b(void) { callee_b(); } | EXPORT_FUNC_PTR(static_cold_func_b); | | static __cold | void static_cold_func_c(void) { callee_c(); } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold-caller.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold-caller.o | | test-aligned-cold-caller.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | | 0000000000000000 <callee_a>: | 0: d65f03c0 ret | | 0000000000000004 <callee_b>: | 4: d65f03c0 ret | | 0000000000000008 <callee_c>: | 8: d65f03c0 ret | c: d503201f nop | | 0000000000000010 <static_cold_func_a>: | 10: a9bf7bfd stp x29, x30, [sp, #-16]! | 14: 910003fd mov x29, sp | 18: 97fffffa bl 0 <callee_a> | 1c: a8c17bfd ldp x29, x30, [sp], #16 | 20: d65f03c0 ret | 24: d503201f nop | 28: d503201f nop | 2c: d503201f nop | | 0000000000000030 <static_cold_func_b>: | 30: a9bf7bfd stp x29, x30, [sp, #-16]! | 34: 910003fd mov x29, sp | 38: 97fffff3 bl 4 <callee_b> | 3c: a8c17bfd ldp x29, x30, [sp], #16 | 40: d65f03c0 ret | 44: d503201f nop | 48: d503201f nop | 4c: d503201f nop | | 0000000000000050 <static_cold_func_c>: | 50: a9bf7bfd stp x29, x30, [sp, #-16]! | 54: 910003fd mov x29, sp | 58: 97ffffec bl 8 <callee_c> | 5c: a8c17bfd ldp x29, x30, [sp], #16 | 60: d65f03c0 ret | 64: d503201f nop | 68: d503201f nop | 6c: d503201f nop | | 0000000000000070 <cold_func_a>: | 70: a9bf7bfd stp x29, x30, [sp, #-16]! | 74: 910003fd mov x29, sp | 78: 97ffffe2 bl 0 <callee_a> | 7c: a8c17bfd ldp x29, x30, [sp], #16 | 80: d65f03c0 ret | 84: d503201f nop | 88: d503201f nop | 8c: d503201f nop | | 0000000000000090 <cold_func_b>: | 90: a9bf7bfd stp x29, x30, [sp, #-16]! | 94: 910003fd mov x29, sp | 98: 97ffffdb bl 4 <callee_b> | 9c: a8c17bfd ldp x29, x30, [sp], #16 | a0: d65f03c0 ret | a4: d503201f nop | a8: d503201f nop | ac: d503201f nop | | 00000000000000b0 <cold_func_c>: | b0: a9bf7bfd stp x29, x30, [sp, #-16]! | b4: 910003fd mov x29, sp | b8: 97ffffd4 bl 8 <callee_c> | bc: a8c17bfd ldp x29, x30, [sp], #16 | c0: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold-caller.o | | test-aligned-cold-caller.o: file format elf64-littleaarch64 | | Sections: | Idx Name Size VMA LMA File off Algn | 0 .text 000000c4 0000000000000000 0000000000000000 00000040 2**4 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000108 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 00000120 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 00000120 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000133 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000110 0000000000000000 0000000000000000 00000138 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA