https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529
--- Comment #17 from Jim Wilson <wilson at gcc dot gnu.org> --- I still haven't been able to reproduce this, but I do see a problem. In the original bug report, the only difference is that the code uses x4 in the first part of the diff, and x24 in the second part of the diff, which seems unimportant. However, this value lives across a call to memcpy. x24 is a safe register here because it is callee saved. x4 is not safe though, as it is an argument passing/return value register, which may be clobbered by a call. Whether it gets clobbered depends on the memcpy implementation that is linked with. If people are linking with different memcpy implementations, that might affect whether the bug is reproducible. Disassembling my testcase, I don't see the same code sequence though. I see 401530: d2800802 mov x2, #0x40 // #64 401534: 52800b01 mov w1, #0x58 // #88 401538: aa1303e0 mov x0, x19 40153c: 940000d1 bl 401880 <memset> 401540: 9121c324 add x4, x25, #0x870 401544: 91001663 add x3, x19, #0x5 which is OK, because the "add x3, x19, #0x5" instruction comes after the memset call. Maybe there is something subtly different about how I'm configuring or building the toolchain that results in the different LTO optimized code.
