https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82150
--- Comment #9 from david.welch at netronome dot com --- Basically gcc is generating a sequence where data starts to execute in the pipe. I cant imagine that is a good idea to let the processor execute data when you can avoid it instead of a pop {...pc} ; some data a pop { ... lr} ; bx lr creates a data hazard, the bx doesnt execute until the register change has resolved. Other cores might not execute the words after a pop in the pipeline if pc is one of the popped values but this core does. Patching this instruction sequence after the execution has started is just a kludge.