https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94298

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
The situation is a bit more complicated. IRA DTRT:

    8: r85:V2DF=[r86:DI+`y']
      REG_EQUIV [r86:DI+`y']
   11: r89:V2DF=vec_select(vec_concat(r85:V2DF,r85:V2DF),parallel)
   12: r90:V2DF=vec_select(vec_concat(r85:V2DF,r85:V2DF),parallel)
      REG_DEAD r85:V2DF

Later, LRA propagates memory operand into the insn. Since the insn clobbers its
input, multiple loads are emitted:

   26: xmm1:V2DF=[ax:DI+`y']
   11: xmm1:V2DF=vec_select(vec_concat(xmm1:V2DF,[ax:DI+`y']),parallel)
   28: xmm0:V2DF=[ax:DI+`y']
   12: xmm0:V2DF=vec_select(vec_concat([ax:DI+`y'],xmm0:V2DF),parallel)

which is further "optimized" in postreload pass:

   26: xmm1:V2DF=[ax:DI+`y']
   11: xmm1:V2DF=vec_select(vec_concat(xmm1:V2DF,xmm1:V2DF),parallel)
   28: xmm0:V2DF=[ax:DI+`y']
   12: xmm0:V2DF=vec_select(vec_concat(xmm0:V2DF,xmm0:V2DF),parallel)

It looks to me that a heuristics is missing in LRA, where memory operand
shouldn't propagate into insn, if there are multiple uses of a register.

Reply via email to