https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao.liu from comment #4) > Guess we need to extend backend hook to handle different input and output > modes. Yes, alternatively as said, some special cases could be directly handled. For example v16si -> v8si could be handled by VEC_PERM <lowpart, highpart, {..}> without any extra magic (but IIRC we don't have a way to query target support for specific BIT_FIELD_REFs which we'd use for getting at the lowpart or highpart and if not available those would fall back to memory). And contiguous permutes could be directly emitted as BIT_FIELD_REFs (in some cases). I have a half-way patch that does the preparatory work but leaves vectorizable_slp_permutation unchanged so we immediately fail there due to FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) { if (!vect_maybe_update_slp_op_vectype (child, vectype) || !types_compatible_p (SLP_TREE_VECTYPE (child), vectype)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Unsupported lane permutation\n"); return false; the comment above that says /* ??? We currently only support all same vector input and output types while the SLP IL should really do a concat + select and thus accept arbitrary mismatches. */ so it was designed to handle more, it wasn't just necessary to implement it ...