https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65741
Segher Boessenkool <segher at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2015-04-12 CC| |segher at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Segher Boessenkool <segher at gcc dot gnu.org> --- Confirmed. If not using an asm but, say, a simple assignment, cddce1 gets rid of the loop. Moving the asm outside of the loop isn't done at tree level, but only by the RTL opts. Most RTL opts can only deal with single sets, which explains why your multiple-output asm isn't optimised as well as you'd like.