https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123175
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 63077 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63077&action=edit Attempt at squashing the bug There's a _lot_ of existing technically wrong vec_perm_indices building in match.pd and possibly elsewhere. It feels like we miss a CTOR from an existing GENERIC VEC_PERM_EXPR / GIMPLE gassign with VEC_PERM_EXPR here. I also wonder how "read" we're really with dealing with permute input vs. output nelts mismatches given even expand_vec_perm_const itself does rtx expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, const vec_perm_builder &sel, machine_mode sel_mode, rtx target) { ... /* Always specify two input vectors here and leave the target to handle cases in which the inputs are equal. Not all backends can cope with the single-input representation when testing for a double-input target instruction. */ vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode)); while it specifies 'mode' is the mode of the vectors being permuted expand_expr_real_2 passes it the mode of the result. IIRC all this was relaxed for VLA vs. fixed-size "permutes" for SVE/AdvSIMD? And most targets will simply refuse to can_vec_perm_const_p, like x86 which does bool ix86_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target, rtx op0, rtx op1, const vec_perm_indices &sel) { if (vmode != op_mode) return false; and the middle-end does nothing to "try harder", for example by padding out inputs even if just by using paradoxical subregs (and then obviously adjusting the permute mask accordingly). Which means my attempt at "optimizing" __builtin_shufflevector would only pessimize at this point. Still I can imagine some of the foldings to go wrong with SVE/AdvSIMD exploiting of the weaker constraints? How's that actually exercised? Richard?
