https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94298

--- Comment #2 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 24 Mar 2020, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94298
> 
> --- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
> The situation is a bit more complicated. IRA DTRT:
> 
>     8: r85:V2DF=[r86:DI+`y']
>       REG_EQUIV [r86:DI+`y']
>    11: r89:V2DF=vec_select(vec_concat(r85:V2DF,r85:V2DF),parallel)
>    12: r90:V2DF=vec_select(vec_concat(r85:V2DF,r85:V2DF),parallel)
>       REG_DEAD r85:V2DF
> 
> Later, LRA propagates memory operand into the insn. Since the insn clobbers 
> its
> input, multiple loads are emitted:
> 
>    26: xmm1:V2DF=[ax:DI+`y']
>    11: xmm1:V2DF=vec_select(vec_concat(xmm1:V2DF,[ax:DI+`y']),parallel)
>    28: xmm0:V2DF=[ax:DI+`y']
>    12: xmm0:V2DF=vec_select(vec_concat([ax:DI+`y'],xmm0:V2DF),parallel)
> 
> which is further "optimized" in postreload pass:
> 
>    26: xmm1:V2DF=[ax:DI+`y']
>    11: xmm1:V2DF=vec_select(vec_concat(xmm1:V2DF,xmm1:V2DF),parallel)
>    28: xmm0:V2DF=[ax:DI+`y']
>    12: xmm0:V2DF=vec_select(vec_concat(xmm0:V2DF,xmm0:V2DF),parallel)
> 
> It looks to me that a heuristics is missing in LRA, where memory operand
> shouldn't propagate into insn, if there are multiple uses of a register.

Yeah, but the odd thing is the memory doesn't actually end up in the
insn but is reloaded!  (I've filed a related PR recently where it actually
ends up in the insns but duplicated and thus code size grows but register
pressure decreases)

So I wonder whether the bug is that there is a memory alternative
in the first place?

Reply via email to