https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97147

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #2)
> Disable (define_insn "*sse3_haddv2df3_low" and (define_insn
> "*sse3_hsubv2df3_low" seems to be ok.
> But for foo1.
> 
> v2df foo1 (v2df x, v2df y)
> {
>   v2df a;
>   a[0] = x[0] + x[1];
>   a[1] = y[0] + y[1];
>   return a;
> }
> 
> it's 
> 
>   vhaddpd %xmm1, %xmm0, %xmm0
>   ret
> 
> vs 
> 
>         movapd  xmm2, xmm0
>         unpckhpd        xmm2, xmm2
>         addsd   xmm0, xmm2
>         movapd  xmm2, xmm1
>         unpckhpd        xmm1, xmm1
>         addsd   xmm1, xmm2
>         unpcklpd        xmm0, xmm1
>         ret
> 
> and note w/o vhaddpd, codegen can be optimized to 
> 
>         movapd  xmm2, xmm0
>         unpcklpd        xmm2, xmm1
>         unpckhpd        xmm0, xmm1
>         addpd   xmm0, xmm2
>         ret
> 
> Guess maybe it's better done in gimple level?

On GIMPLE we see the testcase basically unchanged from what the source does:

  _1 = BIT_FIELD_REF <x_7(D), 64, 0>;
  _2 = BIT_FIELD_REF <x_7(D), 64, 64>;
  _3 = _1 + _2;
  a_9 = BIT_INSERT_EXPR <a_8(D), _3, 0>;
  _4 = BIT_FIELD_REF <y_10(D), 64, 0>;
  _5 = BIT_FIELD_REF <y_10(D), 64, 64>;
  _6 = _4 + _5;
  a_11 = BIT_INSERT_EXPR <a_9, _6, 64>;
  return a_11;

vectorization fails in SLP discovery because we essentially see two lanes
operating on different vectors and we don't implement a way to shuffle
them together.

I think the full hadd define_insns are OK to keep, they really have special
arrangements (esp. the SFmode variants).  But the reductions to scalar
(*_low) seem unnecessary and penaltizing (maybe we can guard use of those
with a -mtune-ctl?).

I also see we're missing patterns for h{add,sub}ps (not sure if we can manage
to get combine to synthesize it).

Reply via email to