https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #4)
> Guess we need to extend backend hook to handle different input and output
> modes.

Yes, alternatively as said, some special cases could be directly handled.
For example v16si -> v8si could be handled by VEC_PERM <lowpart, highpart,
{..}>
without any extra magic (but IIRC we don't have a way to query target support
for specific BIT_FIELD_REFs which we'd use for getting at the lowpart
or highpart and if not available those would fall back to memory).
And contiguous permutes could be directly emitted as BIT_FIELD_REFs
(in some cases).

I have a half-way patch that does the preparatory work but leaves
vectorizable_slp_permutation unchanged so we immediately fail there
due to

  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
    {
      if (!vect_maybe_update_slp_op_vectype (child, vectype)
          || !types_compatible_p (SLP_TREE_VECTYPE (child), vectype))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "Unsupported lane permutation\n");
          return false;

the comment above that says

  /* ???  We currently only support all same vector input and output types
     while the SLP IL should really do a concat + select and thus accept
     arbitrary mismatches.  */

so it was designed to handle more, it wasn't just necessary to implement it ...

Reply via email to