https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017
Segher Boessenkool <segher at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2020-07-01 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> --- Yes, or we should duplicate more of the function earlier (before RA), for example the return tails. Separate shrink wrapping does not duplicate code (it can easily explode code size -- it will be interesting to see how much it can help, if we restrict it somehow). But it is r31 already before shrink-wrapping -- we need some renaming / copying of registers (like in Peter's code) to get rid of it. In an example like this it is quite useful, but in a lot of "real world" code there are no free volatile registers to scarf up. Or that was my impression anyway, when I last looked at this. Time to revisit it... (The "ELFv1" code is just .L.test: lwz 9,0(3) addis 10,2,.LC0@toc@ha ld 10,.LC0@toc@l(10) std 31,-8(1) stdu 1,-128(1) lwa 31,0(10) cmpwi 0,9,0 bne 0,.L8 addi 1,1,128 mr 3,31 ld 31,-8(1) blr .p2align 4,,15 .L8: mflr 0 std 0,144(1) bl slowpath nop ld 0,144(1) addi 1,1,128 mr 3,31 ld 31,-8(1) mtlr 0 blr which feels simpler... but it is kind of the same?