Hi all, The documentation on RTL canonical forms in md.texi says:
"Equality comparisons of a group of bits (usually a single bit) with zero will be written using @code{zero_extract} rather than the equivalent @code{and} or @code{sign_extract} operations. " However, this is not always followed in combine. If it's trying to optimise a comparison against zero of a bitmask that is the mode mask of some mode (255 for QImode and 65535 for HImode in the testcases of this patch) it will instead create a subreg to that shorter mode. This means that for the example: int f255 (int x) { if (x & 255) return 1; return x; } it ends up trying to make a QImode comparison against zero, for which targets like aarch64 have no pattern. This patch attempts to fix this in two places in combine. First is simplify_comparison when handling the and-bitmask case. Currently it will call gen_lowpart_or_truncate on the argument to produce the short subreg. With this patch we don't do that when comparing against zero. This way the and-bitmask form is preserved for make_extraction later on to convert into a zero_extract. The second place is in make_extraction itself where it tries to avoid creating a zero_extract, but the canonicalisation rules and the function comment for make_extraction say that it should try hard create a zero_extraction when inside a comparison in particular (" IN_COMPARE is nonzero if we are in a COMPARE. This means that a ZERO_EXTRACT should be built even for bits starting at bit 0.") With this patch for the testcases: int f255 (int x) { if (x & 255) return 1; return x; } int foo (long x) { return ((short) x != 0) ? x : 1; } we now generate for aarch64 at -O2: f255: tst x0, 255 csinc w0, w0, wzr, eq ret and foo: tst x0, 65535 csinc x0, x0, xzr, ne ret instead of the previous: f255: and w1, w0, 255 cmp w1, wzr csinc w0, w0, wzr, eq ret foo: sxth w1, w0 cmp w1, wzr csinc x0, x0, xzr, ne ret Bootstrapped and tested on arm, aarch64, x86_64. To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64 pattern for comparing a zero_extract with zero. On aarch64 this greatly increases the usage of the TST instruction by about 54% on SPEC2006. Performance-wise there were no regressions and slight improvements on SPECINT that may just be above normal noise (overall 0.5% improvement). On arm it makes very little difference (arm already defines QI and HImode comparisons against zero) but makes more use of the lsrs-immediate instruction in place of the arm tst instruction, which has a shorter encoding in Thumb2 state. On x86_64 I saw no difference in code size for SPEC2006 on my setup. What do people think of this approach? I hope this just enforces the already documented canonicalisation rules with minimal(none?) negative fallout. Thanks, Kyrill 2015-12-17 Kyrylo Tkachov <kyrylo.tkac...@arm.com> PR rtl-optimization/68796 * combine.c (make_extraction): Don't try to avoid the extraction if inside a compare. (simplify_comparison): Don't truncate to lowpart if comparing against zero and target doesn't have a native compare instruction in the required short mode. 2015-12-17 Kyrylo Tkachov <kyrylo.tkac...@arm.com> PR rtl-optimization/68796 * gcc.target/aarch64/tst_5.c: New test. * gcc.target/aarch64/tst_6.c: Likewise.
diff --git a/gcc/combine.c b/gcc/combine.c index 8601d8983ce345e2129dd047b3520d98c0582842..345e63f9a05f2310a5c9e5b239ed069d22565d1c 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7337,10 +7337,13 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, low-order bit and this is either not in the destination or we have the appropriate STRICT_LOW_PART operation available. + Don't do this if we are inside a comparison, as the canonicalization + rules call for a zero_extract form. For MEM, we can avoid an extract if the field starts on an appropriate boundary and we can change the mode of the memory reference. */ if (tmode != BLKmode + && !in_compare && ((pos_rtx == 0 && (pos % BITS_PER_WORD) == 0 && !MEM_P (inner) && (inner_mode == tmode @@ -12108,14 +12111,19 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) unless TRULY_NOOP_TRUNCATION allows it or the register is known to hold a value of the required mode the - transformation is invalid. */ + transformation is invalid. + If the target does not have a compare instruction of that mode + don't do this when comparing against 0 since the canonicalization + rules require such an operation to be represented as a + zero_extract, which make_extraction will produce later on. */ if ((equality_comparison_p || unsigned_comparison_p) && CONST_INT_P (XEXP (op0, 1)) && (i = exact_log2 ((UINTVAL (XEXP (op0, 1)) & GET_MODE_MASK (mode)) + 1)) >= 0 && const_op >> i == 0 - && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode) + && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode + && (const_op != 0 || have_insn_for (COMPARE, tmode))) { op0 = gen_lowpart_or_truncate (tmode, XEXP (op0, 0)); continue; diff --git a/gcc/testsuite/gcc.target/aarch64/tst_5.c b/gcc/testsuite/gcc.target/aarch64/tst_5.c new file mode 100644 index 0000000000000000000000000000000000000000..868a3be502e1468592c17b8a9c6a359af00e61a8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tst_5.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +f255 (int x) +{ + if (x & 255) + return 1; + return x; +} + +int +f65535 (int x) +{ + if (x & 65535) + return 1; + return x; +} + +/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*255" } } */ +/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*65535" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/tst_6.c b/gcc/testsuite/gcc.target/aarch64/tst_6.c new file mode 100644 index 0000000000000000000000000000000000000000..2f1c78d873445b11562c5e2863a3f642ff11f318 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tst_6.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +foo (long x) +{ + return ((short) x != 0) ? x : 1; +} + +/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*65535" } } */