If we change the code to accumulate the result to original value of res[1],

  res[1] += mul_hi + (sum_mi_carry << 32) + (sum_mi >> 32) + (sum_32 >> 32);

The first reassoc would break the pattern, and rewrite it to be:

  tmp = res[1] + mul_hi;
  ...
  res[1] = ...;

________________________________________
From: Feng Xue OS <[email protected]>
Sent: Thursday, December 4, 2025 5:38 PM
To: Richard Biener
Cc: GCC Patches; [email protected]; [email protected]
Subject: Re: [RFC] Pattern matching on plain emulation of 64x64->128 integer 
multiplication

>>
>> A possible approach to reference is detection of tabled-based CTZ in 
>> ssa-forward pass, which might be a suitable position for this pattern, in 
>> that ssa-forward pass is invoked more than one time, some happen before 
>> reassociation pass. But simplification of the 64x64->128 pattern should be 
>> classified as mathematics related optimization, and is better to put it in 
>> tree-ssa-mathopt.cc for logical consistency. In the file, there is a 
>> pass_optimize_widening_mul that is very close to what we want, but it is too 
>> late to keep entirety of the pattern against reassociation. So I consider 
>> adding a dedicated pass like pass_cse_reciprocals, meanwhile, place it prior 
>> to reassociation, and the major procedure is manually coding based matching, 
>> and only some leaf patterns are defined via match.pd.

> As you figured we already have highpart multiplication recognition.
> It's the late reassoc pass that is the problem, given it applies
> reassoc_width?  Or is the early reassoc pass also problematic?  We do
> have some special-casing of fma patters in reassoc, so
> in theory we could handle widening mult candidates similarly, but then
> doing widen-mult detection where we do cse_sincos/reciprocals
> (those back-to-back passes should ideally share their CFG walk) is a
> viable approach.  I would suggest against the ssa-forward pass,
> esp. against trying to introduce highpart multiplication before
> vectorization or other loop optimizations.

Current highpart multiplication and widen-mul recognition only handle cases
where operations involves truncation from large integer type to small one. An
example from regression testcases:

unsigned long long foo(unsigned long long x, unsigned long long y)
{
  return ((__uint128_t)x * (__uint128_t)y) >> 64;
}

That is, __uint128_t is explicitly used in source code, then it is nothing about
addition, and suffers no impact from reassociation. However, the above
pattern is completely emulated with uint64_t operations, which
is unaware of 128-bit integer operation literally. It is synthesized from small
type to large, this is the difference.

I'm afraid we might not specially handle the pattern as fma in reassoc, in that
it is much complex.

Regards,
Feng

Reply via email to