https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 21 Nov 2024, liuhongt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542 > > --- Comment #3 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > (In reply to Hongtao Liu from comment #2) > > (In reply to Richard Biener from comment #1) > > > It doesn't even unambiguously specify whether the mode is that of the > > > source > > > or the destination. The original idea was of course that the size > > > unambiguously specifies the destination mode and thus specifying it would > > > be > > > redundant. Making > > > all those optabs conversion optabs has some overhead and is useless in > > > 99% of > > > the cases. > > > > > > Can you combine both destination mode variants in vec_pack_trunc_VnSF and > > > use predicates to select? > > Then the mode of operand[0] will be hided in the predicates, I doubt it > > would fail below check in supportable_narrowing_operation > > 4739 if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype)) > > Yes, something like this should work. I suggest to polish up a patch with this also containing the backend pattern adjustments and post it for review. The alternative is a convert optab for vec_pack_trunc, or alternatively a vec_pack_trunc2. > diff --git a/gcc/expr.cc b/gcc/expr.cc > index aa6ee85e719..f935d0e7767 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -10900,6 +10900,30 @@ expand_expr_real_2 (const_sepops ops, rtx target, > machine_mode tmode, > expand_insn (icode, 4, eops); > return eops[0].value; > } > + /* There're 2 kinds of half precison floating point, and > vec_pack_trunc_m > + can't be overloaded. Making all those optabs conversion optabs has > + some overhead and is useless in 99% of the cases. So the mode could > + be hided in predicate and mode of type is real tmode. */ > + if (VECTOR_FLOAT_TYPE_P (type) > + && VECTOR_FLOAT_TYPE_P (TREE_TYPE (treeop0)) > + && GET_MODE_SIZE (GET_MODE_INNER (TYPE_MODE (type))) == 2 > + && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (treeop0)) * 2, > + TYPE_VECTOR_SUBPARTS (type)) > + && tmode == E_VOIDmode) > + { > + mode = TYPE_MODE (TREE_TYPE (treeop0)); > + tmode = TYPE_MODE (type); > + class expand_operand eops[3]; > + expand_operands (treeop0, treeop1, > + subtarget, &op0, &op1, EXPAND_NORMAL); > + this_optab = vec_pack_trunc_optab; > + enum insn_code icode = optab_handler (this_optab, mode); > + create_output_operand (&eops[0], target, tmode); > + create_input_operand (&eops[1], op0, mode); > + create_input_operand (&eops[2], op1, mode); > + expand_insn (icode, 3, eops); > + return eops[0].value; > + } > mode = TYPE_MODE (TREE_TYPE (treeop0)); > subtarget = NULL_RTX; > goto binop; > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 7a92da00f7d..d4744063045 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -15010,6 +15010,24 @@ supportable_narrowing_operation (code_helper code, > return true; > } > > + /* There're 2 kinds of half precison floating point, and vec_pack_trunc_m > + can't be overloaded. Making all those optabs conversion optabs has > + some overhead and is useless in 99% of the cases. > + So check predicate here. */ > + if (c1 == VEC_PACK_TRUNC_EXPR > + && VECTOR_FLOAT_TYPE_P (narrow_vectype) > + && VECTOR_FLOAT_TYPE_P (vectype) > + && GET_MODE_SIZE (GET_MODE_INNER (TYPE_MODE (narrow_vectype))) == 2 > + && known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2, > + TYPE_VECTOR_SUBPARTS (narrow_vectype)) > + && insn_data[icode1].operand[0].predicate) > + { > + machine_mode dpmode = insn_data[icode1].operand[0].mode; > + machine_mode dmode = TYPE_MODE (narrow_vectype); > + if (insn_data[icode1].operand[0].predicate (gen_reg_rtx (dmode), > dpmode)) > + return true; > + } > + > >