Hi all, While evaluating Maxim's SW prefetch patches [1] I noticed that the aarch64 prefetch pattern is overly restrictive in its address operand. It only accepts simple register addressing modes. In fact, the PRFM instruction accepts almost all modes that a normal 64-bit LDR supports. The restriction in the pattern leads to explicit address calculation code to be emitted which we could avoid.
This patch relaxes the restrictions on the prefetch define_insn. It creates a predicate and constraint that allow the full addressing modes that PRFM allows. Thus for the testcase in the patch (adapted from one of the existing __builtin_prefetch tests in the testsuite) we can generate a: prfm PLDL1STRM, [x1, 8] instead of the current prfm PLDL1STRM, [x1] with an explicit increment of x1 by 8 in a separate instruction. I've removed the %a output modifier in the output template and wrapped the address operand into a DImode MEM before passing it down to aarch64_print_operand. This is because operand 0 is an address operand rather than a memory operand and thus doesn't have a mode associated with it. When processing the 'a' output modifier the code in final.c will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode argument. This will ICE on aarch64 because we need a mode for the memory in order for aarch64_classify_address to work correctly. Rather than overriding the VOIDmode in aarch64_print_operand_address I decided to instead create the DImode MEM in the "prefetch" output template and treat it as a normal 64-bit memory address, which at the point of assembly output is what it is anyway. With this patch I see a reduction in instruction count in the SPEC2006 benchmarks when SW prefetching is enabled on top of Maxim's patchset because fewer address calculation instructions are emitted due to the use of the more expressive addressing modes. It also fixes a performance regression that I observed in 410.bwaves from Maxim's patches on Cortex-A72. I'll be running a full set of benchmarks to evaluate this further, but I think this is the right thing to do. Bootstrapped and tested on aarch64-none-linux-gnu. Maxim, do you want to try this on top of your patches on your hardware to see if it helps with the regressions you mentioned? Thanks, Kyrill [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html 2016-02-03 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * config/aarch64/aarch64.md (prefetch); Adjust predicate and constraint on operand 0 to allow more general addressing modes. Adjust output template. * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p): New function. * config/aarch64/aarch64-protos.h (aarch64_address_valid_for_prefetch_p): Declare prototype. * config/aarch64/constraints.md (Dp): New address constraint. * config/aarch64/predicates.md (aarch64_prefetch_operand): New predicate. 2016-02-03 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com> Date: Thu Feb 2 14:46:11 2017 +0000 [AArch64] Accept more addressing modes for PRFM diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index babc327..61706de 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); int aarch64_get_condition_code (rtx); +bool aarch64_address_valid_for_prefetch_p (rtx, bool); bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index acc093a..c05eff3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info *info, } } +/* Return true if the address X is valid for a PRFM instruction. + STRICT_P is true if we should do strict checking with + aarch64_classify_address. */ + +bool +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p) +{ + struct aarch64_address_info addr; + + /* PRFM accepts the same addresses as DImode... */ + bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p); + if (!res) + return false; + + /* ... except writeback forms. */ + return addr.type != ADDRESS_REG_WB; +} + bool aarch64_symbolic_address_p (rtx x) { diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b9e8edf..c6201a5 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -518,27 +518,31 @@ (define_insn "nop" ) (define_insn "prefetch" - [(prefetch (match_operand:DI 0 "register_operand" "r") + [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp") (match_operand:QI 1 "const_int_operand" "") (match_operand:QI 2 "const_int_operand" ""))] "" { - const char * pftype[2][4] = + const char * pftype[2][4] = { - {"prfm\\tPLDL1STRM, %a0", - "prfm\\tPLDL3KEEP, %a0", - "prfm\\tPLDL2KEEP, %a0", - "prfm\\tPLDL1KEEP, %a0"}, - {"prfm\\tPSTL1STRM, %a0", - "prfm\\tPSTL3KEEP, %a0", - "prfm\\tPSTL2KEEP, %a0", - "prfm\\tPSTL1KEEP, %a0"}, + {"prfm\\tPLDL1STRM, %0", + "prfm\\tPLDL3KEEP, %0", + "prfm\\tPLDL2KEEP, %0", + "prfm\\tPLDL1KEEP, %0"}, + {"prfm\\tPSTL1STRM, %0", + "prfm\\tPSTL3KEEP, %0", + "prfm\\tPSTL2KEEP, %0", + "prfm\\tPSTL1KEEP, %0"}, }; int locality = INTVAL (operands[2]); gcc_assert (IN_RANGE (locality, 0, 3)); + /* PRFM accepts the same addresses as a 64-bit LDR so wrap + the address into a DImode MEM so that aarch64_print_operand knows + how to print it. */ + operands[0] = gen_rtx_MEM (DImode, operands[0]); return pftype[INTVAL(operands[1])][locality]; } [(set_attr "type" "load1")] diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 5a252c0..b829337 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -214,3 +214,8 @@ (define_constraint "Dd" A constraint that matches an immediate operand valid for AdvSIMD scalar." (and (match_code "const_int") (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))"))) + +(define_address_constraint "Dp" + "@internal + An address valid for a prefetch instruction." + (match_test "aarch64_address_valid_for_prefetch_p (op, true)")) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index e83d45b..8e3ea9b 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand" (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL, 0)"))) +(define_predicate "aarch64_prefetch_operand" + (match_test "aarch64_address_valid_for_prefetch_p (op, false)")) + (define_predicate "aarch64_valid_symref" (match_code "const, symbol_ref, label_ref") { diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c new file mode 100644 index 0000000..26ab913 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* Check that we can generate the immediate-offset addressing + mode for PRFM. */ + +#define ARRSIZE 65 +int *bad_addr[ARRSIZE]; + +void +prefetch_for_read (void) +{ + int i; + for (i = 0; i < ARRSIZE; i++) + __builtin_prefetch (bad_addr[i] + 2, 0, 0); +} + +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */