https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830
--- Comment #2 from Petr <kobalicek.petr at gmail dot com> --- I'm not sure I follow with the exit test. I mean the code should be correct as each point has x|y coord, which is two doubles, so length 8 means 16 doubles (I converted from my production code into a simpler form that uses only native types). However, I think that the problem is also that if this code was handwritten it would only use 3 registers (dst, src, and i), but GCC uses: rax|rcd|rdx|rsi|rdi|r8|r9 which is a lot and the same code in 32-bit mode contains one short spill of GP register. Basically if I needed more GP registers inside the function the problem would be much bigger (but no clue if GCC would use different approach in that case).