https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94298
--- Comment #2 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 24 Mar 2020, ubizjak at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94298 > > --- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> --- > The situation is a bit more complicated. IRA DTRT: > > 8: r85:V2DF=[r86:DI+`y'] > REG_EQUIV [r86:DI+`y'] > 11: r89:V2DF=vec_select(vec_concat(r85:V2DF,r85:V2DF),parallel) > 12: r90:V2DF=vec_select(vec_concat(r85:V2DF,r85:V2DF),parallel) > REG_DEAD r85:V2DF > > Later, LRA propagates memory operand into the insn. Since the insn clobbers > its > input, multiple loads are emitted: > > 26: xmm1:V2DF=[ax:DI+`y'] > 11: xmm1:V2DF=vec_select(vec_concat(xmm1:V2DF,[ax:DI+`y']),parallel) > 28: xmm0:V2DF=[ax:DI+`y'] > 12: xmm0:V2DF=vec_select(vec_concat([ax:DI+`y'],xmm0:V2DF),parallel) > > which is further "optimized" in postreload pass: > > 26: xmm1:V2DF=[ax:DI+`y'] > 11: xmm1:V2DF=vec_select(vec_concat(xmm1:V2DF,xmm1:V2DF),parallel) > 28: xmm0:V2DF=[ax:DI+`y'] > 12: xmm0:V2DF=vec_select(vec_concat(xmm0:V2DF,xmm0:V2DF),parallel) > > It looks to me that a heuristics is missing in LRA, where memory operand > shouldn't propagate into insn, if there are multiple uses of a register. Yeah, but the odd thing is the memory doesn't actually end up in the insn but is reloaded! (I've filed a related PR recently where it actually ends up in the insns but duplicated and thus code size grows but register pressure decreases) So I wonder whether the bug is that there is a memory alternative in the first place?