Thanks Richard for comments.
> Thanks and sorry for the delay...
Never mind, I see you were busy for middle-end vectorization changes recently.
> OK, so this is basically widen + mult vs. pre-widen + widen_mult vs.
> widen_mult.
> IMO this would be perfect for a widening_mult helper?
Sure thing, seems also benefits other SAT_MUL pattern(s) introduced in previous
too. So I may try to refactor that first.
> I see. That's odd. In fact I'd say we'd want to transform
> (nop_convert (bit_{ior,and,xor}:c (convert @0) @1))
> to move the conversion inside like
> (for op (bit_ior bit_and bit_xor)
> (simplify
> (nop_convert (op:c (convert @0) @1))
> (op:type (convert @0) (convert @1))))
That make much sense to me, and we have similar scenario for the SAT_MUL
pattern in previous, will refactor this first.
Pan
-----Original Message-----
From: Richard Biener <[email protected]>
Sent: Friday, October 31, 2025 9:37 PM
To: Li, Pan2 <[email protected]>
Cc: [email protected]; [email protected]; [email protected];
[email protected]; [email protected]; Chen, Ken <[email protected]>;
Liu, Hongtao <[email protected]>; [email protected]
Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
On Tue, Oct 28, 2025 at 3:31 AM Li, Pan2 <[email protected]> wrote:
>
> Thanks Richard for comments.
>
> > (match
> > (widening_mult @0 @1)
> > (widen_mult @0 @1))
> > (match
> > (widening_mult @0 @1)
> > (mult (convert @0) (convert @1))
> > (if (it is widening))
>
> I see, that wrap the convert part into another helper predicate. That make
> sense to me, and will have a try soon.
> I think the difference comes from the the uint128_t/uint64_t has different
> gimple compares to the rest.
>
> 1. WT uint128_t, NT uint64_t, we have gimple as below.
>
> 8 │ uint128_t x;
> 19 │ x_8 = a_6(D) w* b_7(D);
>
> The NT (aka uint64_t) doesn't need to convert when widen_mul to uint128_t.
>
> 2. WT uint128_t, NT uint32_t, as well as the rest combination if WT > NT.
>
> 8 │ uint128_t x;
> 21 │ _12 = (unsigned long) a_6(D);
> 22 │ _13 = (unsigned long) b_7(D);
> 23 │ x_8 = _12 w* _13;
>
> The NT (aka uint32_t) need to convert to uint64_t before widen_mul to
> uint128_t.
>
> In previous, I found it is not easy to merge the 2 case into one expression.
> Thus, add one pattern for case 1 and the other for case 2.
OK, so this is basically widen + mult vs. pre-widen + widen_mult vs. widen_mult.
IMO this would be perfect for a widening_mult helper?
> > Now for the question - part of the pattern is
> > (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult
> > ...)))
> > I wonder whether you've seen the outer conversion being moved inside the
> > IOR?
> > At least I wonder why there's the (convert ...) around the widening_mult and
> > why that's not conditional? The textual description of the pattern mentions
> > (NT)x | (NT)overflow_p, so the result should already be in NT, no
> > outer conversion
> > required? That said, (NT)((WT)x | (WT)overflow_p) would be equally valid,
> > but we fold that inside the IOR? But not always?
>
> Some of the cases introduces signed type for some intermediate result, like
> WT is uint16_t, NT is uint8_t.
>
> 11 │ signed char _3; // we have signed char from _3 to _6.
> 12 │ signed char _4;
> 13 │ signed char _5;
> 14 │ signed char _6;
> 15 │ uint8_t _11;
> 16 │
> 17 │ <bb 2> [local count: 1073741824]:
> 18 │ _1 = (short unsigned int) a_7(D);
> 19 │ _2 = (short unsigned int) b_8(D);
> 20 │ x_9 = _1 * _2;
> 21 │ overflow_p_10 = x_9 > 255;
> 22 │ _3 = (signed char) overflow_p_10;
> 23 │ _4 = -_3;
> 24 │ _5 = (signed char) x_9;
> 25 │ _6 = _4 | _5;
> 26 │ _11 = (uint8_t) _6;
I see. That's odd. In fact I'd say we'd want to transform
(nop_convert (bit_{ior,and,xor}:c (convert @0) @1))
to move the conversion inside like
(for op (bit_ior bit_and bit_xor)
(simplify
(nop_convert (op:c (convert @0) @1))
(op:type (convert @0) (convert @1))))
> 27 │ return _11;
>
> When we take a look for WT is uint64_t, NT is uint32_t, we will have unsigned.
>
> 11 │ unsigned int _3;
> 12 │ unsigned int _4;
> 13 │ unsigned int _5;
> ...
> 26 │ _6 = _4 | _5;
>
> The outer convert with mark '?' is introduced to match both of the above 2
> cases.
Thanks and sorry for the delay...
Richard.
> Pan
>
>
> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Monday, October 27, 2025 9:14 PM
> To: Li, Pan2 <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Chen, Ken <[email protected]>;
> Liu, Hongtao <[email protected]>; [email protected]
> Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
>
> On Sun, Oct 26, 2025 at 11:58 AM Li, Pan2 <[email protected]> wrote:
> >
> > Hi Richard,
> >
> > I had a try for 2 parts those days.
> >
> > 1. Remove outer convert?, it will make the pattern failed to match as the
> > type is signed
> > without that convert, and the types_match (type, @0, @1) to be false.
> > 2. Merge 2 into one pattern with (convert?@4 @0) but it will also result in
> > pattern match fail(when NT is uint32_t and WT is uint64_t).
> > Because the types_match (type, capture[0], capture[1]) will be false,
> > as type is SI and capture is DI.
> >
> > So I think it is possible to separate the common part into a helper pattern
> > like unsigned_sat_mul_helper.
> > And make the real unsigned_integer_sat_mul(@0, @1) for the first one, and
> > unsigned_integer_sat_mul((convert @0), (convert @1)
> > for the second one.
> > Then we don't need to duplicate most of the patterns up to a point. How do
> > you think of it? Thanks a lot.
>
> Maybe we can first get one of the patterns to a form I understand (the
> one with the many conversions)?
> Splitting out parts might only obfuscate things. It would be maybe
> nice to be able to merge different
> alternatives within the same (match ..) sharing both the name and the
> conditions, like
>
> (match (foo @0 @1)
> (convert (mult @0 @1))
> (widen_mult @0 @1)
> (if (...)))
>
> If you can structure the two patterns in a way that the (if ...) part
> is identical that would help.
>
> Some question:
>
> + (convert?
> + (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> + (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1)))))
>
> So I get that we want a widening multiplication but that's either a widen_mult
> already or not. We might want to have a
>
> (match
> (widening_mult @0 @1)
> (widen_mult @0 @1))
> (match
> (widening_mult @0 @1)
> (mult (convert @0) (convert @1))
> (if (it is widening))
>
> to help merging. Now for the question - part of the pattern is
>
> (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult
> ...)))
>
> I wonder whether you've seen the outer conversion being moved inside the IOR?
> At least I wonder why there's the (convert ...) around the widening_mult and
> why that's not conditional? The textual description of the pattern mentions
> (NT)x | (NT)overflow_p, so the result should already be in NT, no
> outer conversion
> required? That said, (NT)((WT)x | (WT)overflow_p) would be equally valid,
> but we fold that inside the IOR? But not always?
>
> Thanks,
> Richard.
>
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2
> > Sent: Friday, October 24, 2025 9:52 AM
> > To: 'Richard Biener' <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; Chen, Ken <[email protected]>;
> > Liu, Hongtao <[email protected]>; [email protected]
> > Subject: RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
> >
> > Thanks Richard for comments.
> >
> > Assume we talk about the form 7 as below:
> >
> > #define DEF_SAT_U_MUL_FMT_7(NT, WT) \
> > NT __attribute__((noinline)) \
> > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
> > { \
> > WT x = (WT)a * (WT)b; \
> > NT max = -1; \
> > bool overflow_p = x > (WT)(max); \
> > return -(NT)(overflow_p) | (NT)x; \
> > }
> >
> > > So why is matching the conversion necessary at all, that is, why is (mult
> > > @0 @1)
> > > not sufficient here?
> >
> > Because there are many different types combinations, unsigned SAT_MUL need
> > the help of a wider type.
> > For example:
> >
> > If NT is uint32_t, WT is uint128_t, we need the convert before widen_mul:
> > 21 │ _12 = (unsigned long) a_6(D);
> > 22 │ _13 = (unsigned long) b_7(D);
> > 23 │ x_8 = _12 w* _13;
> >
> > If NT is uint64_t, WT is uint128_t, we don't have the convert.
> > 19 │ x_8 = a_6(D) w* b_7(D);
> >
> > > You are not looking at the types of @0 or @1 at
> > > all, AFAICS
> > > one could be 'char' and one could be 'short'?
> >
> > The type is restricted by (if (types_match (type, @0, @1)). Aka the result
> > must have the same type as @0 and @1.
> >
> > > Captures on optional nodes are "merged" to the operand, so for
> > > (convert?@4 @0)
> > > when there is no conversion @4 == @0. That is it behaves as if there
> > > were two
> > > captures at the place of @0.
> >
> > I see, sounds like alias here, thanks for the explanation.
> >
> > > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
> > > desired type?
> > > That is, I wonder if even when the conversion is missing, the
> > > saturated operation
> > > can be matched by using (original-type)SAT_UXYZ (...), implying the
> > > SAT_UXYZ
> > > produces a uint8_t (in this case)? Not sure I phrased this in an
> > > understandable way ;)
> >
> > > I'll note that the outer conversion is poorly constrained given the
> > > conversion
> > > around the multiplication/inside the negate is what determines the
> > > semantics of it
> > > and that's not constrained either?
> >
> > For NT is uint8_t and WT is uint128_t, we have the final convert from
> > gimple. The bigger types mentioned in above
> > don't have such convert. The _11 and _6 should be almost the same, maybe
> > nop_convert here is good enough if it failed
> > to match that. I will have a try and keep you posted.
> >
> > 14 │ signed char _6;
> > 15 │ uint8_t _11;
> > ...
> > 26 │ _3 = (signed char) overflow_p_10;
> > 27 │ _4 = -_3;
> > 28 │ _5 = (signed char) x_9;
> > 29 │ _6 = _4 | _5;
> > 30 │ _11 = (uint8_t) _6;
> > 31 │ return _11;
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: Thursday, October 23, 2025 7:08 PM
> > To: Li, Pan2 <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; Chen, Ken <[email protected]>;
> > Liu, Hongtao <[email protected]>; [email protected]
> > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
> >
> > On Wed, Oct 22, 2025 at 2:21 PM Li, Pan2 <[email protected]> wrote:
> > >
> > > Thanks Richard for comments.
> > >
> > > > I'm a bit confused as to this for and the explicit widen_mult case in
> > > > the 2nd
> > > > pattern below. In fact, the patterns look similar enough to me and I
> > > > wonder
> > > > whether the consumer can handle the outer conversion and the conversion
> > > > of the leafs of the multiplication? The conversion of the leafs
> > > > doesn't have
> > > > any constraints, nor is it obvious why there's an outer conversion
> > > > around the
> > > > IOR.
> > >
> > > The two pattern looks similar up to a point. I separate them into 2
> > > mostly comes from
> > > the inner convert about mult, aka (mult (convert@4 @0) (convert@5 @1)).
> > > Some diff
> > > type combination need to convert while others are not.
> > >
> > > I tried to leverage something like (mult (convert?@4 @0) (convert?@5 @1))
> > > to put
> > > them together. But it may has additional sematics because add '?' after
> > > convert will
> > > cover (mult (convert@4 @0) @1) which is not expected.
> >
> > So why is matching the conversion necessary at all, that is, why is (mult
> > @0 @1)
> > not sufficient here? You are not looking at the types of @0 or @1 at
> > all, AFAICS
> > one could be 'char' and one could be 'short'?
> >
> > > Another problem about add '?' after convert is capture @4/@5, if there is
> > > no need
> > > to convert, I am not sure how to take care of the @4 from the “with“”
> > > scope because it
> > > is optional. The code acts on capture @4/@5 is not correct when there is
> > > no convert.
> >
> > Captures on optional nodes are "merged" to the operand, so for (convert?@4
> > @0)
> > when there is no conversion @4 == @0. That is it behaves as if there were
> > two
> > captures at the place of @0.
> >
> > > I may have another try to merge them into one, but is there any
> > > suggestion here?
> > >
> > > > outer conversion around the
> > > > IOR.
> > >
> > > Some type(s) may need to do a convert to the final type, like
> > > sat_u_mul-8-u8-from-u32,
> > > The seq may look like blow.
> > >
> > > 25 │ _6 = _4 | _5;
> > > 26 │ _11 = (uint8_t) _6;
> > > 27 │ return _11;
> >
> > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
> > desired type?
> > That is, I wonder if even when the conversion is missing, the
> > saturated operation
> > can be matched by using (original-type)SAT_UXYZ (...), implying the SAT_UXYZ
> > produces a uint8_t (in this case)? Not sure I phrased this in an
> > understandable way ;)
> >
> > I'll note that the outer conversion is poorly constrained given the
> > conversion
> > around the multiplication/inside the negate is what determines the
> > semantics of it
> > and that's not constrained either?
> >
> > Richard.
> >
> > > Pan
> > >
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: Wednesday, October 22, 2025 5:00 PM
> > > To: Li, Pan2 <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected]; Chen, Ken
> > > <[email protected]>; Liu, Hongtao <[email protected]>;
> > > [email protected]
> > > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
> > >
> > > On Mon, Oct 20, 2025 at 3:10 PM <[email protected]> wrote:
> > > >
> > > > From: Pan Li <[email protected]>
> > > >
> > > > This patch would like to try to match the the unsigned
> > > > SAT_MUL form 7, aka below:
> > > >
> > > > #define DEF_SAT_U_MUL_FMT_7(NT, WT) \
> > > > NT __attribute__((noinline)) \
> > > > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
> > > > { \
> > > > WT x = (WT)a * (WT)b; \
> > > > NT max = -1; \
> > > > bool overflow_p = x > (WT)(max); \
> > > > return -(NT)(overflow_p) | (NT)x; \
> > > > }
> > > >
> > > > while WT is uint128_t, uint64_t, uint32_t and uint16_t, and
> > > > NT is uint64_t, uint32_t, uint16_t or uint8_t.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * match.pd: Add pattern for SAT_MUL form 5 include
> > > > mul and widen_mul.
> > > >
> > > > Signed-off-by: Pan Li <[email protected]>
> > > > ---
> > > > gcc/match.pd | 35 +++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 35 insertions(+)
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index bfc51e6579a..0f55a82d989 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -3749,6 +3749,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > bool widen_mult_p = prec * 2 == widen_prec;
> > > > }
> > > > (if (c2_is_type_precision_p && widen_mult_p)))))
> > > > + (for mult_op (mult widen_mult)
> > >
> > > I'm a bit confused as to this for and the explicit widen_mult case in the
> > > 2nd
> > > pattern below. In fact, the patterns look similar enough to me and I
> > > wonder
> > > whether the consumer can handle the outer conversion and the conversion
> > > of the leafs of the multiplication? The conversion of the leafs doesn't
> > > have
> > > any constraints, nor is it obvious why there's an outer conversion around
> > > the
> > > IOR.
> > >
> > > > + (match (unsigned_integer_sat_mul @0 @1)
> > > > + (convert?
> > > > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > > > + (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1)))))
> > > > + (if (types_match (type, @0, @1))
> > > > + (with
> > > > + {
> > > > + unsigned prec = TYPE_PRECISION (type);
> > > > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > > > + unsigned cvt4_prec = TYPE_PRECISION (TREE_TYPE (@4));
> > > > + unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
> > > > +
> > > > + wide_int max = wi::mask (prec, false, widen_prec);
> > > > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > > > + bool widen_mult_p = mult_op == WIDEN_MULT_EXPR && cvt4_prec ==
> > > > cvt5_prec
> > > > + && widen_prec == cvt5_prec * 2;
> > > > + bool mult_p = mult_op == MULT_EXPR && cvt4_prec == cvt5_prec
> > > > + && cvt4_prec == widen_prec && widen_prec > prec;
> > > > + }
> > > > + (if (c2_is_max_p && (mult_p || widen_mult_p)))))))
> > > > + (match (unsigned_integer_sat_mul @0 @1)
> > > > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > > > + (convert (widen_mult:c@3 @0 @1)))
> > > > + (if (types_match (type, @0, @1))
> > > > + (with
> > > > + {
> > > > + unsigned prec = TYPE_PRECISION (type);
> > > > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > > > +
> > > > + wide_int max = wi::mask (prec, false, widen_prec);
> > > > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > > > + bool widen_mult_p = prec * 2 == widen_prec;
> > > > + }
> > > > + (if (c2_is_max_p && widen_mult_p)))))
> > > > )
> > > >
> > > > /* The boundary condition for case 10: IMM = 1:
> > > > --
> > > > 2.43.0
> > > >