https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116445
ktkachov at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ktkachov at gcc dot gnu.org --- Comment #3 from ktkachov at gcc dot gnu.org --- Perhaps the better comparison here is against -mcpu=cortex-m55 -Os (rather than -O): foo: movs r3, #8 push {lr} dls lr, r3 .L2: and r0, r1, r0, lsr #1 le lr, .L2 ldr pc, [sp], #4 It manages to avoid decrementing r3 in the loop altogether and it should be better for codesize and speed