GCC maintainers: The architecture independent builtin __builtin_prefetch() is defined as:
void __builtin_prefetch (const void *addr, int n1, int n2) n1 - prefetch read = 0, prefetch write = 1 n2 - temporal locality 0 to 3. No temporal locality = 0, high temporal locality = 3. The implementation for Power maps to define_insn "prefetch" in gcc/config/rs6000/rs6000.md. The Power implementation currently ignores the value of n2 and simply generates the dcbtst and dbct instructions. This patch maps n2=0 to generate the dcbtstt mnemonic (dcbst for TH value of 0b10000) for a write prefetch and dcbtst for n2 in range [1,3]. The dcbtt mnemonic (dcbt for TH value of 0b10000) is generated for a read prefetch when n2=0 and the dbct instruction is generated for n2 in range [1,3]. The ISA states that the value TH = 0b10000 is a hint that the processor will probably soon perform a load from the addressed block. There is an existing test case in gcc/testsuite/gcc.target/sh/prefetch.dump. The test case generates the following output with the patch: gcc -g -c -o prefetch prefetch.c objdump -S -d prefetch > prefetch.dump more prefetch.dump ... __builtin_prefetch (&data[0], 0, 0); c: 2c 00 3f 39 addi r9,r31,44 10: 2c 4a 00 7e dcbtt 0,r9 __builtin_prefetch (&data[0], 0, 1); 14: 2c 00 3f 39 addi r9,r31,44 18: 2c 4a 00 7c dcbt 0,r9 __builtin_prefetch (&data[0], 0, 2); 1c: 2c 00 3f 39 addi r9,r31,44 20: 2c 4a 00 7c dcbt 0,r9 __builtin_prefetch (&data[0], 0, 3); 24: 2c 00 3f 39 addi r9,r31,44 28: 2c 4a 00 7c dcbt 0,r9 __builtin_prefetch (&data[0], 1, 0); 2c: 2c 00 3f 39 addi r9,r31,44 30: ec 49 00 7e dcbtstt 0,r9 __builtin_prefetch (&data[0], 1, 1); 34: 2c 00 3f 39 addi r9,r31,44 38: ec 49 00 7c dcbtst 0,r9 __builtin_prefetch (&data[0], 1, 2); 3c: 2c 00 3f 39 addi r9,r31,44 40: ec 49 00 7c dcbtst 0,r9 __builtin_prefetch (&data[0], 1, 3); 44: 2c 00 3f 39 addi r9,r31,44 48: ec 49 00 7c dcbtst 0,r9 ... The regression testing of the patch was done on powerpc64le-unknown-linux-gnu (Power 8 LE) with no regressions. Please let me know if the patch looks OK for GCC mainline. Carl Love ---------------------------------------------------------------- gcc/ChangeLog: 2018-05-07 Carl Love <c...@us.ibm.com> * config/rs6000/rs6000.md: Add dcbtst, dcbtt instruction generation to define_insn prefetch. --- gcc/config/rs6000/rs6000.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2b15cca..7429d33 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -13233,10 +13233,19 @@ (match_operand:SI 2 "const_int_operand" "n"))] "" { - if (GET_CODE (operands[0]) == REG) - return INTVAL (operands[1]) ? "dcbtst 0,%0" : "dcbt 0,%0"; - return INTVAL (operands[1]) ? "dcbtst %a0" : "dcbt %a0"; -} + if (GET_CODE (operands[0]) == REG) { + if (INTVAL (operands[1]) == 0) + return INTVAL (operands[2]) ? "dcbt 0,%0" : "dcbtt 0,%0"; + else + return INTVAL (operands[2]) ? "dcbtst 0,%0" : "dcbtstt 0,%0"; + + } else { + if (INTVAL (operands[1]) == 0) + return INTVAL (operands[2]) ? "dcbt %a0" : "dcbtt %a0"; + else + return INTVAL (operands[2]) ? "dcbtst %a0" : "dcbtstt %a0"; + } + } [(set_attr "type" "load")]) ;; Handle -fsplit-stack. -- 2.7.4