https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> --- fold shulfps to vec_perm_exp, but still 2 shulfps are generated. __m128 f (__m128 a, __m128 b) { vector(4) float _3; vector(4) float _5; vector(4) float _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _3 = VEC_PERM_EXPR <b_2(D), b_2(D), { 0, 0, 0, 0 }>; _5 = VEC_PERM_EXPR <a_4(D), a_4(D), { 0, 0, 0, 0 }>; _6 = _3 * _5; return _6; ;; succ: EXIT }