https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96305

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Unnecessary signed x        |not detecting widen
                   |unsigned multiplication     |multiple after a widen
                   |with squares of signed      |multiply with shift
                   |variables                   |
             Target|arm-*-*                     |arm-*-*, aarch64-*-*
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-09-27
          Component|target                      |tree-optimization

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is a gimple level issue really.

We are able to figure out one widen multiple with shift but not the second one:
  _10 = a_2(D) w* a_2(D);
  _11 = _10 >> 32;
  _3 = (long long int) b_4(D);
  _6 = _3 * _11;
  _7 = _6 >> 32;
  _8 = (int) _7;

You can also see the issue on aarch64 too.


If we do this:
inline int hmull(int a, int b) {
    return ((long long)a * b) >> 32;
}

int compute(int a, int b) {
    int t = hmull(a,a);
    asm("":"+r"(t));
    return hmull(t, b);
}
------- CUT ----
On aarch64 we get:
        smull   x0, w0, w0
        asr     x2, x0, 32
        smull   x0, w1, w2
        lsr     x0, x0, 32
        ret

which is exactly what we want.
And on arm we get:
        smull   r3, r0, r0, r0
        smull   r1, r0, r1, r0
        bx      lr

Gimple level:
  _11 = a_2(D) w* a_2(D);
  _12 = _11 >> 32;
  _13 = (int) _12;
  __asm__("" : "=r" t_4 : "0" _13);
  _7 = b_5(D) w* t_4;
  _8 = _7 >> 32;
  _9 = (int) _8;

Notice w* there :).

Note the inline-asm helps even clang too.

Reply via email to