On Wed, 25 Nov 2020, Jakub Jelinek wrote: > Hi! > > The following patch adds some improvements for __builtin_mul_overflow > expansion. > One optimization is for the u1 * u2 -> sr case, as documented we normally > do: > u1 * u2 -> sr > res = (S) (u1 * u2) > ovf = res < 0 || main_ovf (true) > where main_ovf (true) stands for jump on unsigned multiplication overflow. > If we know that the most significant bits of both operands are clear (such > as when they are zero extended from something smaller), we can > emit better coe by handling it like s1 * s2 -> sr, i.e. just jump on > overflow after signed multiplication. > > Another two cases are s1 * s2 -> ur or s1 * u2 -> ur, if we know the minimum > precision needed to encode all values of both arguments summed together > is smaller or equal to destination precision (such as when the two arguments > are sign (or zero) extended from half precision types, we know the overflows > happen only iff one argument is negative and the other argument is positive > (not zero), because even if both have maximum possible values, the maximum > is still representable (e.g. for char * char -> unsigned short > 0x7f * 0x7f = 0x3f01 and for char * unsigned char -> unsigned short > 0x7f * 0xffU = 0x7e81) and as the result is unsigned, all negative results > do overflow, but are also representable if we consider the result signed > - all of them have the MSB set. So, it is more efficient to just > do the normal multiplication in that case and compare the result considered > as signed value against 0, if it is smaller, overflow happened. > > And the get_min_precision change is to improve the char to short handling, > we have there in the IL > _2 = (int) arg_1(D); > promotion from C promotions from char or unsigned char arg, and the caller > adds a NOP_EXPR cast to short or unsigned short. get_min_precision punts > on the narrowing cast though, it handled only widening casts, but we can > handle narrowing casts fine too, by recursing on the narrowing cast operands > and using it only if it has in the end smaller minimal precision, which > would duplicate the sign bits (or zero bits) to both the bits above the > narrowing conversion and also at least one below that. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK. Thanks, Richard. > 2020-10-25 Jakub Jelinek <ja...@redhat.com> > > PR rtl-optimization/95862 > * internal-fn.c (get_min_precision): For narrowing conversion, recurse > on the operand and if the operand precision is smaller than the > current one, return that smaller precision. > (expand_mul_overflow): For s1 * u2 -> ur and s1 * s2 -> ur cases > if the sum of minimum precisions of both operands is smaller or equal > to the result precision, just perform normal multiplication and > set overflow to the sign bit of the multiplication result. For > u1 * u2 -> sr if both arguments have the MSB known zero, use > normal s1 * s2 -> sr expansion. > > * gcc.dg/builtin-artih-overflow-5.c: New test. > > --- gcc/internal-fn.c.jj 2020-10-13 09:25:28.444909391 +0200 > +++ gcc/internal-fn.c 2020-11-24 15:54:32.684762538 +0100 > @@ -553,6 +553,16 @@ get_min_precision (tree arg, signop sign > if (++cnt > 30) > return prec + (orig_sign != sign); > } > + if (CONVERT_EXPR_P (arg) > + && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0))) > + && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) > prec) > + { > + /* We have e.g. (unsigned short) y_2 where int y_2 = (int) x_1(D); > + If y_2's min precision is smaller than prec, return that. */ > + int oprec = get_min_precision (TREE_OPERAND (arg, 0), sign); > + if (oprec < prec) > + return oprec + (orig_sign != sign); > + } > if (TREE_CODE (arg) != SSA_NAME) > return prec + (orig_sign != sign); > wide_int arg_min, arg_max; > @@ -1357,6 +1367,37 @@ expand_mul_overflow (location_t loc, tre > NULL, done_label, > profile_probability::very_likely ()); > goto do_error_label; > case 3: > + if (get_min_precision (arg1, UNSIGNED) > + + get_min_precision (arg0, SIGNED) <= GET_MODE_PRECISION (mode)) > + { > + /* If the first operand is sign extended from narrower type, the > + second operand is zero extended from narrower type and > + the sum of the two precisions is smaller or equal to the > + result precision: if the first argument is at runtime > + non-negative, maximum result will be 0x7e81 or 0x7f..fe80..01 > + and there will be no overflow, if the first argument is > + negative and the second argument zero, the result will be > + 0 and there will be no overflow, if the first argument is > + negative and the second argument positive, the result when > + treated as signed will be negative (minimum -0x7f80 or > + -0x7f..f80..0) there there will be always overflow. So, do > + res = (U) (s1 * u2) > + ovf = (S) res < 0 */ > + struct separate_ops ops; > + ops.code = MULT_EXPR; > + ops.type > + = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), > + 1); > + ops.op0 = make_tree (ops.type, op0); > + ops.op1 = make_tree (ops.type, op1); > + ops.op2 = NULL_TREE; > + ops.location = loc; > + res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL); > + do_compare_rtx_and_jump (res, const0_rtx, GE, false, > + mode, NULL_RTX, NULL, done_label, > + profile_probability::very_likely ()); > + goto do_error_label; > + } > rtx_code_label *do_main_label; > do_main_label = gen_label_rtx (); > do_compare_rtx_and_jump (op0, const0_rtx, GE, false, mode, NULL_RTX, > @@ -1374,7 +1415,16 @@ expand_mul_overflow (location_t loc, tre > /* u1 * u2 -> sr */ > if (uns0_p && uns1_p && !unsr_p) > { > - uns = true; > + if ((pos_neg0 | pos_neg1) == 1) > + { > + /* If both arguments are zero extended from narrower types, > + the MSB will be clear on both and so we can pretend it is > + a normal s1 * s2 -> sr multiplication. */ > + uns0_p = false; > + uns1_p = false; > + } > + else > + uns = true; > /* Rest of handling of this case after res is computed. */ > goto do_main; > } > @@ -1455,6 +1505,37 @@ expand_mul_overflow (location_t loc, tre > profile_probability::very_likely ()); > goto do_error_label; > } > + if (get_min_precision (arg0, SIGNED) > + + get_min_precision (arg1, SIGNED) <= GET_MODE_PRECISION (mode)) > + { > + /* If both operands are sign extended from narrower types and > + the sum of the two precisions is smaller or equal to the > + result precision: if both arguments are at runtime > + non-negative, maximum result will be 0x3f01 or 0x3f..f0..01 > + and there will be no overflow, if both arguments are negative, > + maximum result will be 0x40..00 and there will be no overflow > + either, if one argument is positive and the other argument > + negative, the result when treated as signed will be negative > + and there will be always overflow, and if one argument is > + zero and the other negative the result will be zero and no > + overflow. So, do > + res = (U) (s1 * s2) > + ovf = (S) res < 0 */ > + struct separate_ops ops; > + ops.code = MULT_EXPR; > + ops.type > + = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), > + 1); > + ops.op0 = make_tree (ops.type, op0); > + ops.op1 = make_tree (ops.type, op1); > + ops.op2 = NULL_TREE; > + ops.location = loc; > + res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL); > + do_compare_rtx_and_jump (res, const0_rtx, GE, false, > + mode, NULL_RTX, NULL, done_label, > + profile_probability::very_likely ()); > + goto do_error_label; > + } > /* The general case, do all the needed comparisons at runtime. */ > rtx_code_label *do_main_label, *after_negate_label; > rtx rop0, rop1; > --- gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c.jj 2020-11-24 > 16:13:49.482793354 +0100 > +++ gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c 2020-11-24 > 16:13:44.980843796 +0100 > @@ -0,0 +1,87 @@ > +/* PR rtl-optimization/95862 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +int > +f1 (int a, int b) > +{ > + unsigned long long c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +int > +f2 (int a, unsigned b) > +{ > + unsigned long long c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +int > +f3 (unsigned a, unsigned b) > +{ > + long long c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +int > +f4 (int a, unsigned b) > +{ > + long long c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +short > +f5 (short a, short b) > +{ > + unsigned c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +short > +f6 (short a, unsigned short b) > +{ > + unsigned c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +short > +f7 (unsigned short a, unsigned short b) > +{ > + int c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +short > +f8 (short a, unsigned short b) > +{ > + int c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +signed char > +f9 (signed char a, signed char b) > +{ > + unsigned short c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +signed char > +f10 (signed char a, unsigned char b) > +{ > + unsigned short c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +signed char > +f11 (unsigned char a, unsigned char b) > +{ > + short c; > + return __builtin_mul_overflow (a, b, &c); > +} > + > +signed char > +f12 (signed char a, unsigned char b) > +{ > + short c; > + return __builtin_mul_overflow (a, b, &c); > +} > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend