https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796
--- Comment #3 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> --- It seems to me that register allocation has been a weak spot in gcc for years. gcc often allocates registers in such a way that extra register moves are necessary, compared to competition, like in this particular case. The extra register moves could be 0 cost to execute due to hardware register renaming, but they still waste CPU instruction decoder and cache resources. Whereas I haven't seen such cases with clang at all, but I don't use it as much as gcc. I wonder why gcc register allocation cannot be at least as good as clang's.