https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65358
--- Comment #15 from ktkachov at gcc dot gnu.org --- Hmmm, actually it's not that simple, as testing showed. The comment at the final load-to-regs code says: /* If part should go in registers, copy that part into the appropriate registers. Do this now, at the end, since mem-to-mem copies above may do function calls. */ So just moving this at the beginning is not going to work. Another question that comes up is: why didn't the code in calls.c not catch that we're reading from a clobbered location and cancel the tail call? It's supposed to do that with check_sibcall_argument_overlap at various points in the expansion, but it doesn't catch this case.