https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85324
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2018-04-11 Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- _2 = __builtin_ia32_cvttps2dq ({ 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 }); _3 = _2 + { 1, 1, 1, 1 }; .. _2 = __builtin_ia32_cvttps2qq128_mask ({ 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 }, { 0, 0 }, 255); _3 = _2 + { 1, 1 }; .. _2 = __builtin_ia32_cvttpd2dq ({ 1.0e+0, 1.0e+0 }); _3 = _2 + { 1, 1, 1, 1 }; .. _2 = __builtin_ia32_cvttpd2qq128_mask ({ 1.0e+0, 1.0e+0 }, { 0, 0 }, 255); _3 = _2 + { 1, 1 }; .. _2 = __builtin_ia32_pmovdw128_mask ({ 1, 1, 1, 1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 255); _3 = _2 + { 1, 1, 1, 1, 1, 1, 1, 1 }; the middle-end has representations for all of those and can constant-fold them. I suggest to fold the builtins to middle-end codes in the targets gimple_fold_builtin hook. For the mask cases with a not always execute mask the story may be different (exposing this to the middle-end requires a two-vector "permutation" which might not combine back to the desired ops), but maybe even then constant folding is beneficial in some cases (and then good enough with the middle-end exposure?).