https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- On the GIMPLE side we should canonicalize here I think, at which point inserts into a splatted vector become more profitable depends? _4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 8, 1, 9, 2, 10, 3, 11 }>; _5 = VEC_PERM_EXPR <a_2(D), b_3(D), { 4, 12, 5, 13, 6, 14, 7, 15 }>; _6 = {_4, _5}; we have simplify_vector_constructor in tree-ssa-forwprop.cc. For the other BIT_INSERT_EXPR case I'd go to match.pd, but adding a function to forwprop is also possible. If we want to expand { 4, 4, _1, 4, 4, ..} with splat + insert we should IMHO do that at RTL expansion time where we already try splat (I think). Not sure how to apply costing there though. There's also the possibility to expand { a, a, b, b, a, b, a, ... } with two splat + blend. For vec_init RTL expansion the target has full control, so it can decide for itself (if we do not want to do anything in generic code).