https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167

--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #13)
> fold shulfps to vec_perm_exp, but still 2 shulfps are generated.
> 
> __m128 f (__m128 a, __m128 b)
> {
>   vector(4) float _3;
>   vector(4) float _5;
>   vector(4) float _6;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _3 = VEC_PERM_EXPR <b_2(D), b_2(D), { 0, 0, 0, 0 }>;
>   _5 = VEC_PERM_EXPR <a_4(D), a_4(D), { 0, 0, 0, 0 }>;
>   _6 = _3 * _5;
>   return _6;
> ;;    succ:       EXIT
> 
> }

So this is a bit more complex as not all targets have a good extract/dup
functionary for scalars. So maybe this should be done as a define_insn for x86.

Reply via email to