https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435

Maxim Egorushkin <maxim.yegorushkin at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maxim.yegorushkin at gmail dot 
com

--- Comment #11 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> ---
I have been looking for an option to align only specific loops to a specific
boundary.

In particular, I often have nested loops with the innermost loops being the
hottest and requiring 64-byte L1i cache line alignment, while the outer loops
should be minimizing padding.

When I extract inner loops into separate functions, wrapped with `#pragma GCC
optimize (""-falign-loops=64")`, that achieves the desired loop alignment, but
prevents the loop function from being inlined. Forcing loop function inlining
with `inline __attribute__((always_inline))` removes the effect of `#pragma GCC
optimize (""-falign-loops=64")` for the loop function.

AMD CPU manuals recommend aligning the last byte of the loop machine code to
the last byte of a 64-byte L1i-cache-line, rather than aligning the the first
byte byte of the loop to the first byte of the cache line. Which makes perfect
sense and produces the least amount of padding. If my memory still serves me
right, gcc-11 or gcc-12 did exactly that, which prompted me to examine AMD CPU
manuals for possible clues in the first place, which uncovered this
align-the-last-loop-byte-to-the-end-of-L1i-cache-line advice.
  • [Bug c/67435] Feature requ... maxim.yegorushkin at gmail dot com via Gcc-bugs

Reply via email to