This patch improves the RTL that the middle-end generates for testing signed overflow following a widening multiplication. During this expansion the middle-end generates a truncation which can get used multiple times. Placing this intermediate value in a pseudo register reduces the amount of code generated on platforms where this truncation requires an explicit instruction.
This simple call to force_reg eliminates 368 lines of the -S output from testsuite/c-c++-common/torture/builtin-arith-overflow-1.c on nvptx-none. An example difference is in t120_1smul where the following 7 instruction sequence in which the 1st and 6th instructions perform the same truncation: < cvt.u32.u64 %r31, %r28; <- truncate %r28 < shr.s32 %r30, %r31, 31; < cvt.u32.u64 %r32, %r29; < setp.eq.u32 %r33, %r30, %r32; < selp.u32 %r24, 0, 1, %r33; < cvt.u32.u64 %r25, %r28; <- truncate %r28 < setp.eq.u32 %r34, %r24, 0; is now generated as a 4 instruction sequence without duplication: > cvt.u32.u64 %r30, %r28; > shr.s32 %r31, %r30, 31; > cvt.u32.u64 %r32, %r29; > setp.eq.u32 %r33, %r31, %r32; On x86_64-pc-linux-gnu, where SUBREGs are free, this patch generates exactly the same builtin-arith-overflow-1.s as before. This patch has been tested on both x86_64-pc-linux-gnu with "make bootstrap" and nvptx-none with "make", with no new testsuite regressions on either platform. Ok for mainline? 2020-07-06 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog: * internal-fn.c (expand_mul_overflow): When checking for signed overflow from a widening multiplication, we access the truncated lowpart RES twice, so keep this value in a pseudo register. Thanks in advance, Roger -- Roger Sayle NextMove Software Cambridge, UK
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 0be2eb4..d1bd6cc 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -1627,6 +1627,9 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1, profile_probability::very_likely ()); else { + /* RES is used more than once, place it in a pseudo. */ + res = force_reg (mode, res); + rtx signbit = expand_shift (RSHIFT_EXPR, mode, res, prec - 1, NULL_RTX, 0); /* RES is low half of the double width result, HIPART