[PATCH][AArch64] Accept more addressing modes for PRFM

Kyrill Tkachov Fri, 03 Feb 2017 09:13:19 -0800

Hi all,

While evaluating Maxim's SW prefetch patches [1] I noticed that the aarch64 
prefetch pattern is
overly restrictive in its address operand. It only accepts simple register 
addressing modes.
In fact, the PRFM instruction accepts almost all modes that a normal 64-bit LDR 
supports.
The restriction in the pattern leads to explicit address calculation code to be 
emitted which we could avoid.


This patch relaxes the restrictions on the prefetch define_insn. It creates a 
predicate and constraint that
allow the full addressing modes that PRFM allows. Thus for the testcase in the 
patch (adapted from one of the existing
__builtin_prefetch tests in the testsuite) we can generate a:
prfm    PLDL1STRM, [x1, 8]

instead of the current
prfm    PLDL1STRM, [x1]
with an explicit increment of x1 by 8 in a separate instruction.

I've removed the %a output modifier in the output template and wrapped the 
address operand into a DImode MEM before
passing it down to aarch64_print_operand.

This is because operand 0 is an address operand rather than a memory operand 
and thus doesn't have a mode associated
with it.  When processing the 'a' output modifier the code in final.c will call 
TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
argument.  This will ICE on aarch64 because we need a mode for the memory in 
order for aarch64_classify_address to work
correctly.  Rather than overriding the VOIDmode in 
aarch64_print_operand_address I decided to instead create the DImode
MEM in the "prefetch" output template and treat it as a normal 64-bit memory 
address, which at the point of assembly output
is what it is anyway.

With this patch I see a reduction in instruction count in the SPEC2006 
benchmarks when SW prefetching is enabled on top
of Maxim's patchset because fewer address calculation instructions are emitted 
due to the use of the more expressive
addressing modes. It also fixes a performance regression that I observed in 
410.bwaves from Maxim's patches on Cortex-A72.
I'll be running a full set of benchmarks to evaluate this further, but I think 
this is the right thing to do.

Bootstrapped and tested on aarch64-none-linux-gnu.

Maxim, do you want to try this on top of your patches on your hardware to see 
if it helps with the regressions you mentioned?

Thanks,
Kyrill


[1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html

2016-02-03  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    * config/aarch64/aarch64.md (prefetch); Adjust predicate and
    constraint on operand 0 to allow more general addressing modes.
    Adjust output template.
    * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
    New function.
    * config/aarch64/aarch64-protos.h
    (aarch64_address_valid_for_prefetch_p): Declare prototype.
    * config/aarch64/constraints.md (Dp): New address constraint.
    * config/aarch64/predicates.md (aarch64_prefetch_operand): New
    predicate.

2016-02-03  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    * gcc.target/aarch64/prfm_imm_offset_1.c: New test.

commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
Date:   Thu Feb 2 14:46:11 2017 +0000

    [AArch64] Accept more addressing modes for PRFM

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index babc327..61706de 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
+bool aarch64_address_valid_for_prefetch_p (rtx, bool);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
 unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
 unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index acc093a..c05eff3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info *info,
     }
 }
 
+/* Return true if the address X is valid for a PRFM instruction.
+   STRICT_P is true if we should do strict checking with
+   aarch64_classify_address.  */
+
+bool
+aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
+{
+  struct aarch64_address_info addr;
+
+  /* PRFM accepts the same addresses as DImode...  */
+  bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
+  if (!res)
+    return false;
+
+  /* ... except writeback forms.  */
+  return addr.type != ADDRESS_REG_WB;
+}
+
 bool
 aarch64_symbolic_address_p (rtx x)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b9e8edf..c6201a5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -518,27 +518,31 @@ (define_insn "nop"
 )
 
 (define_insn "prefetch"
-  [(prefetch (match_operand:DI 0 "register_operand" "r")
+  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
             (match_operand:QI 1 "const_int_operand" "")
             (match_operand:QI 2 "const_int_operand" ""))]
   ""
   {
-    const char * pftype[2][4] = 
+    const char * pftype[2][4] =
     {
-      {"prfm\\tPLDL1STRM, %a0",
-       "prfm\\tPLDL3KEEP, %a0",
-       "prfm\\tPLDL2KEEP, %a0",
-       "prfm\\tPLDL1KEEP, %a0"},
-      {"prfm\\tPSTL1STRM, %a0",
-       "prfm\\tPSTL3KEEP, %a0",
-       "prfm\\tPSTL2KEEP, %a0",
-       "prfm\\tPSTL1KEEP, %a0"},
+      {"prfm\\tPLDL1STRM, %0",
+       "prfm\\tPLDL3KEEP, %0",
+       "prfm\\tPLDL2KEEP, %0",
+       "prfm\\tPLDL1KEEP, %0"},
+      {"prfm\\tPSTL1STRM, %0",
+       "prfm\\tPSTL3KEEP, %0",
+       "prfm\\tPSTL2KEEP, %0",
+       "prfm\\tPSTL1KEEP, %0"},
     };
 
     int locality = INTVAL (operands[2]);
 
     gcc_assert (IN_RANGE (locality, 0, 3));
 
+    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
+       the address into a DImode MEM so that aarch64_print_operand knows
+       how to print it.  */
+    operands[0] = gen_rtx_MEM (DImode, operands[0]);
     return pftype[INTVAL(operands[1])][locality];
   }
   [(set_attr "type" "load1")]
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 5a252c0..b829337 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -214,3 +214,8 @@ (define_constraint "Dd"
  A constraint that matches an immediate operand valid for AdvSIMD scalar."
  (and (match_code "const_int")
       (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
+
+(define_address_constraint "Dp"
+  "@internal
+ An address valid for a prefetch instruction."
+ (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index e83d45b..8e3ea9b 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
        (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
 					       0)")))
 
+(define_predicate "aarch64_prefetch_operand"
+  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
+
 (define_predicate "aarch64_valid_symref"
   (match_code "const, symbol_ref, label_ref")
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
new file mode 100644
index 0000000..26ab913
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that we can generate the immediate-offset addressing
+   mode for PRFM.  */
+
+#define ARRSIZE 65
+int *bad_addr[ARRSIZE];
+
+void
+prefetch_for_read (void)
+{
+  int i;
+  for (i = 0; i < ARRSIZE; i++)
+    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
+}
+
+/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */

[PATCH][AArch64] Accept more addressing modes for PRFM

Reply via email to