https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787
--- Comment #5 from Bill Schmidt <wschmidt at gcc dot gnu.org> --- The divergence occurs after .L75 in the two versions. In the P10 version, we see that the second bctrl has been converted into a bctr. It looks like a tail call optimization happening, but we aren't at the end of the function. This happens again later for the second bctrl after .L78. Why would we think a tail call optimization can happen in the middle of a block...