On Wed, 25 Nov 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds some improvements for __builtin_mul_overflow
> expansion.
> One optimization is for the u1 * u2 -> sr case, as documented we normally
> do:
>      u1 * u2 -> sr
>         res = (S) (u1 * u2)
>         ovf = res < 0 || main_ovf (true)
> where main_ovf (true) stands for jump on unsigned multiplication overflow.
> If we know that the most significant bits of both operands are clear (such
> as when they are zero extended from something smaller), we can
> emit better coe by handling it like s1 * s2 -> sr, i.e. just jump on
> overflow after signed multiplication.
> 
> Another two cases are s1 * s2 -> ur or s1 * u2 -> ur, if we know the minimum
> precision needed to encode all values of both arguments summed together
> is smaller or equal to destination precision (such as when the two arguments
> are sign (or zero) extended from half precision types, we know the overflows
> happen only iff one argument is negative and the other argument is positive
> (not zero), because even if both have maximum possible values, the maximum
> is still representable (e.g. for char * char -> unsigned short
> 0x7f * 0x7f = 0x3f01 and for char * unsigned char -> unsigned short
> 0x7f * 0xffU = 0x7e81) and as the result is unsigned, all negative results
> do overflow, but are also representable if we consider the result signed
> - all of them have the MSB set.  So, it is more efficient to just
> do the normal multiplication in that case and compare the result considered
> as signed value against 0, if it is smaller, overflow happened.
> 
> And the get_min_precision change is to improve the char to short handling,
> we have there in the IL
>   _2 = (int) arg_1(D);
> promotion from C promotions from char or unsigned char arg, and the caller
> adds a NOP_EXPR cast to short or unsigned short.  get_min_precision punts
> on the narrowing cast though, it handled only widening casts, but we can
> handle narrowing casts fine too, by recursing on the narrowing cast operands
> and using it only if it has in the end smaller minimal precision, which
> would duplicate the sign bits (or zero bits) to both the bits above the
> narrowing conversion and also at least one below that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-10-25  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR rtl-optimization/95862
>       * internal-fn.c (get_min_precision): For narrowing conversion, recurse
>       on the operand and if the operand precision is smaller than the
>       current one, return that smaller precision.
>       (expand_mul_overflow): For s1 * u2 -> ur and s1 * s2 -> ur cases
>       if the sum of minimum precisions of both operands is smaller or equal
>       to the result precision, just perform normal multiplication and
>       set overflow to the sign bit of the multiplication result.  For
>       u1 * u2 -> sr if both arguments have the MSB known zero, use
>       normal s1 * s2 -> sr expansion.
> 
>       * gcc.dg/builtin-artih-overflow-5.c: New test.
> 
> --- gcc/internal-fn.c.jj      2020-10-13 09:25:28.444909391 +0200
> +++ gcc/internal-fn.c 2020-11-24 15:54:32.684762538 +0100
> @@ -553,6 +553,16 @@ get_min_precision (tree arg, signop sign
>        if (++cnt > 30)
>       return prec + (orig_sign != sign);
>      }
> +  if (CONVERT_EXPR_P (arg)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
> +      && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) > prec)
> +    {
> +      /* We have e.g. (unsigned short) y_2 where int y_2 = (int) x_1(D);
> +      If y_2's min precision is smaller than prec, return that.  */
> +      int oprec = get_min_precision (TREE_OPERAND (arg, 0), sign);
> +      if (oprec < prec)
> +     return oprec + (orig_sign != sign);
> +    }
>    if (TREE_CODE (arg) != SSA_NAME)
>      return prec + (orig_sign != sign);
>    wide_int arg_min, arg_max;
> @@ -1357,6 +1367,37 @@ expand_mul_overflow (location_t loc, tre
>                                  NULL, done_label, 
> profile_probability::very_likely ());
>         goto do_error_label;
>       case 3:
> +       if (get_min_precision (arg1, UNSIGNED)
> +           + get_min_precision (arg0, SIGNED) <= GET_MODE_PRECISION (mode))
> +         {
> +           /* If the first operand is sign extended from narrower type, the
> +              second operand is zero extended from narrower type and
> +              the sum of the two precisions is smaller or equal to the
> +              result precision: if the first argument is at runtime
> +              non-negative, maximum result will be 0x7e81 or 0x7f..fe80..01
> +              and there will be no overflow, if the first argument is
> +              negative and the second argument zero, the result will be
> +              0 and there will be no overflow, if the first argument is
> +              negative and the second argument positive, the result when
> +              treated as signed will be negative (minimum -0x7f80 or
> +              -0x7f..f80..0) there there will be always overflow.  So, do
> +              res = (U) (s1 * u2)
> +              ovf = (S) res < 0  */
> +           struct separate_ops ops;
> +           ops.code = MULT_EXPR;
> +           ops.type
> +             = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> +                                               1);
> +           ops.op0 = make_tree (ops.type, op0);
> +           ops.op1 = make_tree (ops.type, op1);
> +           ops.op2 = NULL_TREE;
> +           ops.location = loc;
> +           res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> +           do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> +                                    mode, NULL_RTX, NULL, done_label,
> +                                    profile_probability::very_likely ());
> +           goto do_error_label;
> +         }
>         rtx_code_label *do_main_label;
>         do_main_label = gen_label_rtx ();
>         do_compare_rtx_and_jump (op0, const0_rtx, GE, false, mode, NULL_RTX,
> @@ -1374,7 +1415,16 @@ expand_mul_overflow (location_t loc, tre
>    /* u1 * u2 -> sr  */
>    if (uns0_p && uns1_p && !unsr_p)
>      {
> -      uns = true;
> +      if ((pos_neg0 | pos_neg1) == 1)
> +     {
> +       /* If both arguments are zero extended from narrower types,
> +          the MSB will be clear on both and so we can pretend it is
> +          a normal s1 * s2 -> sr multiplication.  */
> +       uns0_p = false;
> +       uns1_p = false;
> +     }
> +      else
> +     uns = true;
>        /* Rest of handling of this case after res is computed.  */
>        goto do_main;
>      }
> @@ -1455,6 +1505,37 @@ expand_mul_overflow (location_t loc, tre
>                                      profile_probability::very_likely ());
>             goto do_error_label;
>           }
> +       if (get_min_precision (arg0, SIGNED)
> +           + get_min_precision (arg1, SIGNED) <= GET_MODE_PRECISION (mode))
> +         {
> +           /* If both operands are sign extended from narrower types and
> +              the sum of the two precisions is smaller or equal to the
> +              result precision: if both arguments are at runtime
> +              non-negative, maximum result will be 0x3f01 or 0x3f..f0..01
> +              and there will be no overflow, if both arguments are negative,
> +              maximum result will be 0x40..00 and there will be no overflow
> +              either, if one argument is positive and the other argument
> +              negative, the result when treated as signed will be negative
> +              and there will be always overflow, and if one argument is
> +              zero and the other negative the result will be zero and no
> +              overflow.  So, do
> +              res = (U) (s1 * s2)
> +              ovf = (S) res < 0  */
> +           struct separate_ops ops;
> +           ops.code = MULT_EXPR;
> +           ops.type
> +             = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> +                                               1);
> +           ops.op0 = make_tree (ops.type, op0);
> +           ops.op1 = make_tree (ops.type, op1);
> +           ops.op2 = NULL_TREE;
> +           ops.location = loc;
> +           res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> +           do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> +                                    mode, NULL_RTX, NULL, done_label,
> +                                    profile_probability::very_likely ());
> +           goto do_error_label;
> +         }
>         /* The general case, do all the needed comparisons at runtime.  */
>         rtx_code_label *do_main_label, *after_negate_label;
>         rtx rop0, rop1;
> --- gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c.jj        2020-11-24 
> 16:13:49.482793354 +0100
> +++ gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c   2020-11-24 
> 16:13:44.980843796 +0100
> @@ -0,0 +1,87 @@
> +/* PR rtl-optimization/95862 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +f1 (int a, int b)
> +{
> +  unsigned long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f2 (int a, unsigned b)
> +{
> +  unsigned long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f3 (unsigned a, unsigned b)
> +{
> +  long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f4 (int a, unsigned b)
> +{
> +  long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f5 (short a, short b)
> +{
> +  unsigned c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f6 (short a, unsigned short b)
> +{
> +  unsigned c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f7 (unsigned short a, unsigned short b)
> +{
> +  int c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f8 (short a, unsigned short b)
> +{
> +  int c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f9 (signed char a, signed char b)
> +{
> +  unsigned short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f10 (signed char a, unsigned char b)
> +{
> +  unsigned short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f11 (unsigned char a, unsigned char b)
> +{
> +  short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f12 (signed char a, unsigned char b)
> +{
> +  short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Reply via email to