https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #22 from wilco at gcc dot gnu.org --- (In reply to Bernd Edlinger from comment #21) > (In reply to wilco from comment #20) > > > Wilco, where have you seen the additional registers used with my > > > previous patch, maybe we can try to fix that somehow? > > > > What happens is that the move of zero causes us to use extra registers in > > shifts as both source and destination are now always live at the same time. > > We generate worse code for simple examples like x | (y << 3): > > > > -mfpu=vfp: > > push {r4, r5} > > lsls r5, r1, #3 > > orr r5, r5, r0, lsr #29 > > lsls r4, r0, #3 > > orr r0, r4, r2 > > orr r1, r5, r3 > > pop {r4, r5} > > bx lr > > -mfpu=neon: > > lsls r1, r1, #3 > > orr r1, r1, r0, lsr #29 > > lsls r0, r0, #3 > > orrs r0, r0, r2 > > orrs r1, r1, r3 > > bx lr > > > > hmm. I think with my patch reverted the code is the same. > > I tried -O2 -marm -mfpu=vfp -mhard-float get the first variant > with and without patch. Yes that's what I get. > For -O2 -marm -mfpu=vfp -msoft-float I get the second variant > with and witout patch. This still gives the first variant for me. > For -O2 -marm -mfpu=neon -mhard-float I get the second variant Right. > With -O2 -marm -mfpu=neon -msoft-float I get a third variant > again with and without patch: > > lsl r1, r1, #3 > mov ip, r0 > orr r0, r2, r0, lsl #3 > orr r1, r1, ip, lsr #29 > orr r1, r1, r3 > bx lr I don't see this... > Am I missing something? What I meant is that your patch still makes a large difference on the original test case despite making no difference in simple cases like the above. Anyway, there is another bug: on AArch64 we correctly recognize there are 8 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load and a byte reverse. Although it is recognized on ARM and works correctly if it is a little endian load, it doesn't perform the optimization if a byte reverse is needed. As a result there are lots of 64-bit shifts and orrs which create huge register pressure if not expanded early. This testcase is turning out to be a goldmine of bugs...