https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116445
--- Comment #4 from avieira at gcc dot gnu.org --- Good point Kyrill! I was just merely comparing m3 to m55 but yes you are right, with low overhead loops you don't need the r3 count... But -O2 also removes the r3, seems it does it at ud_dce, which doesn't get run for -O1 and it looks like rtl_dce doesn't eliminat the subtract, probably not able to handle cyclic dead code.