Jeff Law writes: > On 04/29/2015 03:36 PM, Jiong Wang wrote: >> >> Jeff Law writes: >> >>> On 04/27/2015 02:21 PM, Jiong Wang wrote: >>> >>>> Jeff, >>>> >>>> Sorry, I can't understand the meaning of "overlap between t_low and >>>> low", >>>> assume "right" in "right value" means the opposite of "left" not >>>> "correct". >>>> >>>> So what you mean is t_low and low share the same pseudo regiser? >>> My concern is sharing the same pseudo or memory location. But thinking >>> more about it, the shifted value has to have range information, so it >>> must have been an SSA_NAME, right? If so, then it can't overlap with >>> the destination, so this is a non-issue. Sorry for the confusion. >> >> Thanks for the light. By looking at related code, looks like even it's >> SSA_NAME, it's still possible to share the same pseudo given the >> destination is in the same SSA map parition after ssa name coleascing? > If they're the same size and have non-overlapping lifetimes, then yes, > they could be the same pseudo. That ought to be easy to check. > Thankfully we don't have to worry about MEMs, which is a harder check. > >> OK. I will rework the patch, and I found there is a function named >> "expand_doubleword_shift" which looks like a more natural place to do >> this optimization, although it's hard to get range info there. I will do >> further explore on this. > Sounds good. > > jeff
Jeff, Sorry for the slow response. I found we can't integrate this transformation into "expand_doubleword_shift" as it's at quite deep layer in the call stack, when we reach there, we have lost some high level info. So I am keeping this transformaion still in expr.c, and attachment is the updated version of this patch. The main changes are: * Figuring out whether the shift source is coming from sign extension by checking SSA_NAME_DEF_STMT instead of deducing from tree range info. I fell checking the gimple statement is more reliable and straigtforward. * For the pseudo register overlaping issue, given: RTX target = TREE source << TREE amount I was trying to make sure those two SSA_NAME won't get the same pseudo register by comparing their assigned partition using var_to_partition, but I can't get the associated tree representation for "target" which is RTX. Then I just expand "source" and compare the register number with "target". But I found the simplest way maybe just reorder low_part (target) = low_part (source) << amount high_part (target) = low_part (source) >> amount1 to: high_part (target) = low_part (source) >> amount1 low_part (target) = low_part (source) << amount then, even target and source share the same pseudo register, we can still get what we want, as we are operating on their subreg. * Added checking on the result value of expand_variable_shift, if it's not the same as "target" then call emit_move_insn to do that. How is the re-worked patch looks to you? x86 bootstrap OK, regression OK. aarch64 bootstrap OK. 2015-08-14 Jiong.Wang <jiong.w...@arm.com> gcc/ * expr.c (expand_expr_real_2): Check tree format to optimize LSHIFT_EXPR expand. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Likewise. * gcc.target/aarch64/ashlti3_1.c: Likewise. * gcc.target/arm/ashldisi_1.c: Likewise. -- Regards, Jiong
diff --git a/gcc/expr.c b/gcc/expr.c index 31b4573..8a25e98 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8836,23 +8836,90 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, case LSHIFT_EXPR: case RSHIFT_EXPR: - /* If this is a fixed-point operation, then we cannot use the code - below because "expand_shift" doesn't support sat/no-sat fixed-point - shifts. */ - if (ALL_FIXED_POINT_MODE_P (mode)) - goto binop; + { + /* If this is a fixed-point operation, then we cannot use the code + below because "expand_shift" doesn't support sat/no-sat fixed-point + shifts. */ + if (ALL_FIXED_POINT_MODE_P (mode)) + goto binop; + + if (! safe_from_p (subtarget, treeop1, 1)) + subtarget = 0; + if (modifier == EXPAND_STACK_PARM) + target = 0; + op0 = expand_expr (treeop0, subtarget, + VOIDmode, EXPAND_NORMAL); + /* Left shfit optimization: + + If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't + native instruction to support this wide mode left shift. Given below + example: + + Type A = (Type) B << C + + |< T >| + | high | low | + + |<- size ->| + + By checking tree shape, if operand B is coming from signed extension, + then the left shift operand can be simplified into: + + 1. high = low >> (size - C); + 2. low = low << C; + */ + + temp = NULL_RTX; + if (code == LSHIFT_EXPR + && target + && REG_P (target) + && ! unsignedp + && mode == GET_MODE_WIDER_MODE (word_mode) + && ! have_insn_for (ASHIFT, mode) + && TREE_CONSTANT (treeop1) + && TREE_CODE (treeop0) == SSA_NAME) + { + gimple def = SSA_NAME_DEF_STMT (treeop0); + if (is_gimple_assign (def) + && gimple_assign_rhs_code (def) == NOP_EXPR) + { + machine_mode rmode = TYPE_MODE + (TREE_TYPE (gimple_assign_rhs1 (def))); - if (! safe_from_p (subtarget, treeop1, 1)) - subtarget = 0; - if (modifier == EXPAND_STACK_PARM) - target = 0; - op0 = expand_expr (treeop0, subtarget, - VOIDmode, EXPAND_NORMAL); - temp = expand_variable_shift (code, mode, op0, treeop1, target, - unsignedp); - if (code == LSHIFT_EXPR) - temp = REDUCE_BIT_FIELD (temp); - return temp; + if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode) + && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) + >= GET_MODE_BITSIZE (word_mode))) + { + rtx low = simplify_gen_subreg (word_mode, op0, mode, 0); + rtx tlow = simplify_gen_subreg (word_mode, target, mode, 0); + rtx thigh = simplify_gen_subreg (word_mode, target, mode, + UNITS_PER_WORD); + HOST_WIDE_INT ramount = (BITS_PER_WORD + - TREE_INT_CST_LOW (treeop1)); + tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount); + + temp = expand_variable_shift (code, word_mode, low, treeop1, + tlow, unsignedp); + if (temp != tlow) + emit_move_insn (tlow, temp); + + temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low, + rshift, thigh, unsignedp); + if (temp != thigh) + emit_move_insn (thigh, temp); + + temp = target ; + } + } + } + + if (temp == NULL_RTX) + temp = expand_variable_shift (code, mode, op0, treeop1, target, + unsignedp); + if (code == LSHIFT_EXPR) + temp = REDUCE_BIT_FIELD (temp); + return temp; + } /* Could determine the answer when only additive constants differ. Also, the addition of one can be handled by changing the condition. */ diff --git a/gcc/testsuite/gcc.dg/wide-shift-128.c b/gcc/testsuite/gcc.dg/wide-shift-128.c new file mode 100644 index 0000000..9b62715 --- /dev/null +++ b/gcc/testsuite/gcc.dg/wide-shift-128.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target aarch64*-*-* mips64*-*-* sparc64*-*-* } } */ +/* { dg-require-effective-target int128 } */ +/* { dg-options "-O2 -fdump-rtl-combine" } */ + +__int128_t +load2 (int data) +{ + return (__int128_t) data << 50; +} + +/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */ +/* { dg-final { cleanup-rtl-dump "combine" } } */ diff --git a/gcc/testsuite/gcc.dg/wide-shift-64.c b/gcc/testsuite/gcc.dg/wide-shift-64.c new file mode 100644 index 0000000..5bc278f --- /dev/null +++ b/gcc/testsuite/gcc.dg/wide-shift-64.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target arm*-*-* mips*-*-* sparc*-*-* } } */ +/* { dg-options "-O2 -fdump-rtl-combine" } */ + +long long +load1 (int data) +{ + return (long long) data << 12; +} + +/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ashltidisi.c b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c new file mode 100644 index 0000000..aeb2a24 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c @@ -0,0 +1,48 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -save-temps" } */ + +extern void abort (void); + +#define GEN_TEST_CASE(x, y, z)\ +__uint128_t __attribute__ ((noinline))\ +ushift_##x##_##z (unsigned y data)\ +{\ + return (__uint128_t) data << x;\ +}\ +__int128_t __attribute__ ((noinline)) \ +shift_##x##_##z (y data) \ +{\ + return (__int128_t) data << x;\ +} + +GEN_TEST_CASE (53, int, i) +GEN_TEST_CASE (3, long long, ll) +GEN_TEST_CASE (13, long long, ll) +GEN_TEST_CASE (53, long long, ll) + +int +main (int argc, char **argv) +{ + +#define SHIFT_CHECK(x, y, z, p) \ + if (ushift_##y##_##p (x)\ + != ((__uint128_t) (unsigned z) x << y)) \ + abort ();\ + if (shift_##y##_##p (x)\ + != ((__uint128_t) (signed z) x << y)) \ + abort (); + + SHIFT_CHECK (0x12345678, 53, int, i) + SHIFT_CHECK (0xcafecafe, 53, int, i) + + SHIFT_CHECK (0x1234567890abcdefLL, 3, long long, ll) + SHIFT_CHECK (0x1234567890abcdefLL, 13, long long, ll) + SHIFT_CHECK (0x1234567890abcdefLL, 53, long long, ll) + SHIFT_CHECK (0xcafecafedeaddeadLL, 3, long long, ll) + SHIFT_CHECK (0xcafecafedeaddeadLL, 13, long long, ll) + SHIFT_CHECK (0xcafecafedeaddeadLL, 53, long long, ll) + + return 0; +} + +/* { dg-final { scan-assembler-times "asr" 4 } } */ diff --git a/gcc/testsuite/gcc.target/arm/ashldisi.c b/gcc/testsuite/gcc.target/arm/ashldisi.c new file mode 100644 index 0000000..00dc06e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/ashldisi.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -save-temps" } */ + +extern void abort (void); + +#define GEN_TEST_CASE(x)\ +unsigned long long __attribute__ ((noinline))\ +ushift_ ## x (unsigned int data)\ +{\ + return (unsigned long long) data << x;\ +}\ +long long __attribute__ ((noinline)) \ +shift_ ## x (int data) \ +{\ + return (long long) data << x;\ +} + +GEN_TEST_CASE (3) +GEN_TEST_CASE (23) +GEN_TEST_CASE (30) +int +main (int argc, char **argv) +{ + +#define SHIFT_CHECK(x, y) \ + if (ushift_ ## y (x)\ + != ((unsigned long long) (unsigned) x << y)) \ + abort (); \ + if (shift_ ## y (x)\ + != ((long long) (signed) x << y)) \ + abort (); + + SHIFT_CHECK (0x12345678, 3) + SHIFT_CHECK (0xcafecafe, 3) + SHIFT_CHECK (0x12345678, 23) + SHIFT_CHECK (0xcafecafe, 23) + SHIFT_CHECK (0x12345678, 30) + SHIFT_CHECK (0xcafecafe, 30) + + return 0; +} + +/* { dg-final { scan-assembler-times "asr" 3 } } */ +/* { dg-final { cleanup-saved-temps } } */