https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435
--- Comment #12 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> --- gcc-13 and gcc-14 no longer align the last byte of a loop to the last byte of a L1i-cache-line, when compiled with `-march=native -mtune=native` on Zen3 and Zen4 CPUs. I remember gcc-11 or gcc-12 aligning to the last byte of a L1i-cache-line with `-march=native -mtune=native`on Zen3, which made me read the AMD CPU optimization manuals. See "Software Optimization Guide for AMD EPYC 7003 Processors" 2.8.3 Loop Alignment: For the processor loop alignment is not usually a significant issue. However, for hot loops, some further knowledge of trade-offs can be helpful. Since the processor can read an aligned 64-byte fetch block every cycle, aligning the end of the loop to the last byte of a 64-byte cache line is the best thing to do, if possible.