AVX2: missed simplification — low32×low32→u64 vectorized multiply expands to generic u64-mul sequence instead of single vpmuludq

pinskia at gcc dot gnu.org via Gcc-bugs Fri, 27 Feb 2026 20:31:30 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124271


--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #4)
> (In reply to Hongtao Liu from comment #2)
> > I think middle-end should simplify
> > 
> >   vect__15.12_21 = vect__4.11_22 & { 4294967295, 4294967295, 4294967295,
> > 4294967295 };
> >   vect__12.8_25 = vect__6.7_26 & { 4294967295, 4294967295, 4294967295,
> > 4294967295 };
> >   vect__16.13_18 = vect__15.12_21 * vect__12.8_25;
> > 
> > to
> > 
> >   op1 = VIEW_CONVERT_EXPR <V8USI>  vect__15.12_21;
> >   op2 = VIEW_CONVERT_EXPR <V8USI>  vect__12.8_25;
> >   vect__16.13_18 = VEC_WIDEN_MULT_EVEN(op1, op2);
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7f16fd4e081..dd8ecad51b7 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -436,6 +436,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>                                      { zeros; })
>                            { ones; } { zeros; })))))))))
> 
> +#if GIMPLE
> +/*    (x & lowhalf_mask) * (y & lowhalf_mask)
> +    -> VEC_WIDEN_MULT_EVEN (VIEW_CONVERT (x), VIEW_CONVERT (y)).  */
> +(simplify
> + (mult (bit_and @0 VECTOR_CST@2)
> +      (bit_and @1 @2))
> + (if (uniform_vector_p (@2)
> +      && TYPE_VECTOR_SUBPARTS (type).is_constant ()
> +      && TYPE_VECTOR_SUBPARTS (type).to_constant () > 1)
> +   (with
> +    {
> +      tree elem = uniform_vector_p (@2);

  auto elem = wi::to_wide (uniform_vector_p (@2));

> +      unsigned int outer_prec = element_precision (TREE_TYPE (type));

Since you are already getting the scalar part of the type, then the above
should just be:
unsigned int outer_prec = TYPE_PRECISION (TREE_TYPE (type));

> +      unsigned int inner_prec = outer_prec / 2;
> +      poly_uint64 outer_nelts = TYPE_VECTOR_SUBPARTS (type);
> +      tree inner_scalar = build_nonstandard_integer_type (inner_prec, 1);

I think 1 here should be `TYPE_UNSIGNED (TREE_TYPE (type))` otherwise you get a
type mismatch I think.

> +      tree inner_type = build_vector_type (inner_scalar, outer_nelts * 2);
> +    }

I would add `GET_MODE_CLASS (TYPE_MODE (inner_type)) == MODE_VECTOR_INT` before
the rest of the checks here. As there might not be a vector mode of that type.

with the above elem change we can change:
> +    (if (wi::eq_p (wi::to_wide (elem), wi::mask (inner_prec, false,
> outer_prec))
to just:
   `elem == wi::mask (inner_prec, false, outer_prec)`

> +        && optab_handler (vec_widen_umult_even_optab,
> +                          TYPE_MODE (inner_type)) != CODE_FOR_nothing)
> +       (vec_widen_mult_even (view_convert:inner_type @0)
> +                           (view_convert:inner_type @1))))))
> +#endif
> +
>  (for cmp (gt ge lt le)
>       outp (convert convert negate negate)
>       outn (negate negate convert convert)
> 
> 
> I'm testing this.

Otherwise it looks decent.

[Bug target/124271] x86/AVX2: missed simplification — low32×low32→u64 vectorized multiply expands to generic u64-mul sequence instead of single vpmuludq

Reply via email to