68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

Kyrill Tkachov Thu, 17 Dec 2015 07:37:21 -0800

Hi all,

The documentation on RTL canonical forms in md.texi says:


"Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.
This means that for the example:
int
f255 (int x)
{
  if (x & 255)
    return 1;
  return x;
}

it ends up trying to make a QImode comparison against zero, for which targets 
like
aarch64 have no pattern.

This patch attempts to fix this in two places in combine.
First is simplify_comparison when handling the and-bitmask case.
Currently it will call gen_lowpart_or_truncate on the argument to produce the 
short subreg.
With this patch we don't do that when comparing against zero.
This way the and-bitmask form is preserved for make_extraction later on to 
convert
into a zero_extract.
The second place is in make_extraction itself where it tries to avoid creating 
a zero_extract,
but the canonicalisation rules and the function comment for make_extraction say 
that it should
try hard create a zero_extraction when inside a comparison in particular
(" IN_COMPARE is nonzero if we are in a COMPARE.  This means that a
   ZERO_EXTRACT should be built even for bits starting at bit 0.")

With this patch for the testcases:
int
f255 (int x)
{
  if (x & 255)
    return 1;
  return x;
}

int
foo (long x)
{
   return ((short) x != 0) ? x : 1;
}

we now generate for aarch64 at -O2:
f255:
        tst     x0, 255
        csinc   w0, w0, wzr, eq
        ret

and
foo:
        tst     x0, 65535
        csinc   x0, x0, xzr, ne
        ret


instead of the previous:
f255:
        and     w1, w0, 255
        cmp     w1, wzr
        csinc   w0, w0, wzr, eq
        ret

foo:
        sxth    w1, w0
        cmp     w1, wzr
        csinc   x0, x0, xzr, ne
        ret


Bootstrapped and tested on arm, aarch64, x86_64.
To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64 pattern
for comparing a zero_extract with zero.
On aarch64 this greatly increases the usage of the TST instruction by about 54% 
on SPEC2006.
Performance-wise there were no regressions and slight improvements on SPECINT 
that may just
be above normal noise (overall 0.5% improvement).
On arm it makes very little difference (arm already defines QI and HImode 
comparisons against zero)
but makes more use of the lsrs-immediate instruction in place of the arm tst 
instruction, which has
a shorter encoding in Thumb2 state.
On x86_64 I saw no difference in code size for SPEC2006 on my setup.

What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules with 
minimal(none?) negative
fallout.

Thanks,
Kyrill


2015-12-17  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR rtl-optimization/68796
    * combine.c (make_extraction): Don't try to avoid the extraction if
    inside a compare.
    (simplify_comparison): Don't truncate to lowpart if comparing against
    zero and target doesn't have a native compare instruction in the
    required short mode.

2015-12-17  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR rtl-optimization/68796
    * gcc.target/aarch64/tst_5.c: New test.
    * gcc.target/aarch64/tst_6.c: Likewise.

diff --git a/gcc/combine.c b/gcc/combine.c
index 8601d8983ce345e2129dd047b3520d98c0582842..345e63f9a05f2310a5c9e5b239ed069d22565d1c 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7337,10 +7337,13 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
      low-order bit and this is either not in the destination or we have the
      appropriate STRICT_LOW_PART operation available.
 
+     Don't do this if we are inside a comparison, as the canonicalization
+     rules call for a zero_extract form.
      For MEM, we can avoid an extract if the field starts on an appropriate
      boundary and we can change the mode of the memory reference.  */
 
   if (tmode != BLKmode
+      && !in_compare
       && ((pos_rtx == 0 && (pos % BITS_PER_WORD) == 0
 	   && !MEM_P (inner)
 	   && (inner_mode == tmode
@@ -12108,14 +12111,19 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 
 	     unless TRULY_NOOP_TRUNCATION allows it or the register is
 	     known to hold a value of the required mode the
-	     transformation is invalid.  */
+	     transformation is invalid.
+	     If the target does not have a compare instruction of that mode
+	     don't do this when comparing against 0 since the canonicalization
+	     rules require such an operation to be represented as a
+	     zero_extract, which make_extraction will produce later on.  */
 	  if ((equality_comparison_p || unsigned_comparison_p)
 	      && CONST_INT_P (XEXP (op0, 1))
 	      && (i = exact_log2 ((UINTVAL (XEXP (op0, 1))
 				   & GET_MODE_MASK (mode))
 				  + 1)) >= 0
 	      && const_op >> i == 0
-	      && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode)
+	      && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode
+	      && (const_op != 0 || have_insn_for (COMPARE, tmode)))
 	    {
 	      op0 = gen_lowpart_or_truncate (tmode, XEXP (op0, 0));
 	      continue;
diff --git a/gcc/testsuite/gcc.target/aarch64/tst_5.c b/gcc/testsuite/gcc.target/aarch64/tst_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..868a3be502e1468592c17b8a9c6a359af00e61a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tst_5.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+f255 (int x)
+{
+  if (x & 255)
+    return 1;
+  return x;
+}
+
+int
+f65535 (int x)
+{
+  if (x & 65535)
+    return 1;
+  return x;
+}
+
+/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*255" } } */
+/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*65535" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/tst_6.c b/gcc/testsuite/gcc.target/aarch64/tst_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..2f1c78d873445b11562c5e2863a3f642ff11f318
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tst_6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (long x)
+{
+   return ((short) x != 0) ? x : 1;
+}
+
+/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*65535" } } */

[PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

Reply via email to