https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- vect___r$_M_value$real_8.21_4 = MEM <vector(2) float> [(float *)a_2(D)]; vect___r$_M_value$real_8.22_24 = VEC_PERM_EXPR <vect___r$_M_value$real_8.21_4, vect___r$_M_value$real_8.21_4, { 0, 0 }>; vect___r$_M_value$real_8.30_35 = VEC_PERM_EXPR <vect___r$_M_value$real_8.21_4, vect___r$_M_value$real_8.21_4, { 1, 1 }>; vect__10.25_28 = MEM <vector(2) float> [(float *)b_3(D)]; vect__12.26_31 = vect___r$_M_value$real_8.22_24 * vect__10.25_28; vect__10.34_40 = VEC_PERM_EXPR <vect__10.25_28, vect__10.25_28, { 1, 0 }>; vect__13.35_43 = vect___r$_M_value$real_8.30_35 * vect__10.34_40; vect__6.36_44 = .VEC_ADDSUB (vect__12.26_31, vect__13.35_43); _46 = BIT_FIELD_REF <vect__6.36_44, 32, 32>; _45 = BIT_FIELD_REF <vect__6.36_44, 32, 0>; Yes looks like a cost issue.