On Mon, Jan 08, 2018 at 01:27:24PM +0000, Wilco Dijkstra wrote: > Segher Boessenkool wrote: > > On Fri, Jan 05, 2018 at 12:22:44PM +0000, Wilco Dijkstra wrote: > >> An example epilog in a shrinkwrapped function before: > >> > >> ldp x21, x22, [sp,#16] > >> ldr x23, [sp,#32] > >> ldr x24, [sp,#40] > >> ldp x25, x26, [sp,#48] > >> ldr x27, [sp,#64] > >> ldr x28, [sp,#72] > >> ldr x30, [sp,#80] > >> ldr d8, [sp,#88] > >> ldp x19, x20, [sp],#96 > >> ret > > > > In this example, the compiler already can make a ldp for both x23/x24 and > > x27/x28 just fine (if not in emit_epilogue_components, then simply in a > > peephole); why did that not work? Or is this not the actual generated > > machine code (and there are labels between the insns, for example)? > > This block originally had a label in it, 2 blocks emitted identical restores > and > then branched to the final epilog. The final epilogue was then duplicated so > we end up with 2 almost identical epilogs of 10 instructions (almost since > there were 1-2 unrelated instructions in both blocks). > > Peepholing is very conservative about instructions using SP and won't touch > anything frame related. If this was working better then the backend could just > emit single loads/stores and let peepholing generate LDP/STP.
How unfortunate; that should definitely be improved then. Always pairing two registers together *also* degrades code quality. > Another issue is that after pro_and_epilogue pass I see multiple restores > of the same registers and then a branch to the same block. We should try > to avoid the unnecessary duplication. It already does that if *all* predecessors of that block do that. If you want to do it in other cases, you end up with more jumps. That may be beneficial in some cases, of course, but it is not an obvious win (and in the general case it is, hrm let's use nice words, "terrible"). Segher