https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609
--- Comment #8 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #7) > (In reply to Richard Biener from comment #4) > > (In reply to Uroš Bizjak from comment #3) > > > The doc says: > > > > > > When used as an lvalue, 'subreg' is a word-based accessor. > > > Storing to a 'subreg' modifies all the words of REG that > > > overlap the 'subreg', but it leaves the other words of REG > > > alone. > > > > But UNITS_PER_WORD is 8 so (subreg:DF (TI)) should leave the upper half > > of the TImode register unchanged. > > Indeed, and -m32 creates correct code. So, it is register allocator that > fails. > > Reconfirmed as rtl-optimization problem. It is a quite interesting PR which reveals a long lasting latent bug in GCC. Basically we have before LRA 2: r90:DF=xmm0:DF REG_DEAD xmm0:DF 3: NOTE_INSN_FUNCTION_BEG 6: r89:TI=[`reg'] 7: r89:TI#0=r90:DF REG_DEAD r90:DF 8: [`reg']=r89:TI#0 LRA and reload pass produces 6: xmm1:TI=[`reg'] 7: xmm1:DF=xmm0:DF 8: [`reg']=xmm1:V2DF They does not do any transformations except transforming subreg of hard register in insn #7. And after that insn #6 is removed as a dead one by subsequent optimizations. In order to avoid removing insn #6 we need to keep the subreg until the final pass: 7: xmm1:TI#0=xmm0:DF Why do LRA and reload remove subregs of hard registers? That is because some subsequent optimizations can handle them. Last two days I've been struggling to find solution which involves only LRA (partial removing subreg of hard regs) but still failing. In any case, even if I find such solution in LRA, it needs extensive testing on other targets and probably it will be ready next week at the best.