https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82862
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- I don't have any good ideas here. Fortran with allocated arrays tends to use quite some integer registers for all the IV setup and computation. One can experiment with less peeling of vector epilogues (--param max-completely-peel-times=1) as well as maybe adding another code sinking pass. In the end it's intelligent remat of expressions (during RA) that needs to be done as I fully expect not having enough integer registers to compute and keep live everything. There seems to be missed invariant motion on the GIMPLE side and also stack allocation in an inner loop which we might be able to hoist. Maybe that (__builtin_stack_save/restore) confuses RA. Those builtins confuse LIM at least (a present memcpy does as well, and we expand that to a libcall). -fno-tree-loop-distribute-patterns helps for that. But even then we still spill a lot. Thus, try -fno-tree-loop-distribute-patterns plus Index: gcc/tree-ssa-loop-im.c =================================================================== --- gcc/tree-ssa-loop-im.c (revision 255051) +++ gcc/tree-ssa-loop-im.c (working copy) @@ -1432,7 +1432,10 @@ gather_mem_refs_stmt (struct loop *loop, bool is_stored; unsigned id; - if (!gimple_vuse (stmt)) + if (!gimple_vuse (stmt) + || gimple_call_builtin_p (stmt, BUILT_IN_STACK_SAVE) + || gimple_call_builtin_p (stmt, BUILT_IN_STACK_RESTORE) + || gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN)) return; mem = simple_mem_ref_in_stmt (stmt, &is_stored);