https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
wilco at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #17 from wilco at gcc dot gnu.org --- (In reply to Bernd Edlinger from comment #16) > Wow. > > Maybe I am dreaming, or something is completely wrong now... Well I can reproduce it so it is real - thanks for the insight! Basically you're forcing Thumb-2 to split early like Thumb-1. This allows the top and bottom halves to be independently optimized (unused halves or zeroes are very common in DI mode operations), resulting in much lower register pressure. Interestingly the subreg removal phase works fine if you split *all* DI mode operations early. This explains the bad code for shifts as they are split early but can't have their subregs removed due to other DI mode operations being split after reload. This means the correct approach is to split all operations before register allocation. Looking at the phase list, Expand appears best as there isn't any constant propagation or dead code elimination done after Split1. Interestingly it will be a major simplification as we can get rid of a huge number of complex patterns and do exactly the same for ARM, Thumb-1, Thumb-2, VFP and NEON.