[PATCH] Cleanup: Replace UNSPEC_COPYSIGN with copysign RTL
When I first implemented COPYSIGN support in the power7 days, we did not have a copysign RTL insn, so I had to use UNSPEC to represent the copysign instruction. This patch removes those UNSPECs, and it uses the native RTL copysign insn. I have tested this on both big endian and little endian PowerPC server systems, and there were no regressions. Can I check this into the master branch? Since it is just a clean-up, I don't see the need to back port it, but it is simple to do the back port if desired. 2023-09-29 Michael Meissner gcc/ * config/rs6000/rs6000.md (UNSPEC_COPYSIGN): Delete. (copysign3_fcpsg): Use copysign RTL instead of UNSPEC. (copysign3_hard): Likewise. (copysign3_soft): Likewise. * config/rs6000/vector.md (vector_copysign3): Use copysign RTL instead of UNSPEC. * config/rs6000/vsx.md (vsx_copysign3): Use copysign RTL instead of UNSPEC. --- gcc/config/rs6000/rs6000.md | 20 gcc/config/rs6000/vector.md | 4 ++-- gcc/config/rs6000/vsx.md| 7 +++ 3 files changed, 13 insertions(+), 18 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 7b583d7a69a..1b6b6cb5bbe 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -108,7 +108,6 @@ (define_c_enum "unspec" UNSPEC_TOCREL UNSPEC_MACHOPIC_OFFSET UNSPEC_BPERM - UNSPEC_COPYSIGN UNSPEC_PARITY UNSPEC_CMPB UNSPEC_FCTIW @@ -5383,9 +5382,8 @@ (define_expand "copysign3" ;; compiler from optimizing -0.0 (define_insn "copysign3_fcpsgn" [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa") - (unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa") - (match_operand:SFDF 2 "gpc_reg_operand" "d,wa")] -UNSPEC_COPYSIGN))] + (copysign:SFDF (match_operand:SFDF 1 "gpc_reg_operand" "d,wa") + (match_operand:SFDF 2 "gpc_reg_operand" "d,wa")))] "TARGET_HARD_FLOAT && (TARGET_CMPB || VECTOR_UNIT_VSX_P (mode))" "@ fcpsgn %0,%2,%1 @@ -14984,10 +14982,9 @@ (define_expand "copysign3" (define_insn "copysign3_hard" [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v") - (unspec:IEEE128 -[(match_operand:IEEE128 1 "altivec_register_operand" "v") - (match_operand:IEEE128 2 "altivec_register_operand" "v")] -UNSPEC_COPYSIGN))] + (copysign:IEEE128 +(match_operand:IEEE128 1 "altivec_register_operand" "v") +(match_operand:IEEE128 2 "altivec_register_operand" "v")))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xscpsgnqp %0,%2,%1" [(set_attr "type" "vecmove") @@ -14995,10 +14992,9 @@ (define_insn "copysign3_hard" (define_insn "copysign3_soft" [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v") - (unspec:IEEE128 -[(match_operand:IEEE128 1 "altivec_register_operand" "v") - (match_operand:IEEE128 2 "altivec_register_operand" "v")] -UNSPEC_COPYSIGN)) + (copysign:IEEE128 +(match_operand:IEEE128 1 "altivec_register_operand" "v") +(match_operand:IEEE128 2 "altivec_register_operand" "v"))) (clobber (match_scratch:IEEE128 3 "=&v"))] "!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xscpsgndp %x3,%x2,%x1\;xxpermdi %x0,%x3,%x1,1" diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 1ae04c8e0a8..f4fc620b653 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -332,8 +332,8 @@ (define_expand "vector_btrunc2" (define_expand "vector_copysign3" [(set (match_operand:VEC_F 0 "vfloat_operand") - (unspec:VEC_F [(match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")] UNSPEC_COPYSIGN))] + (copysign:VEC_F (match_operand:VEC_F 1 "vfloat_operand") + (match_operand:VEC_F 2 "vfloat_operand")))] "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" { if (mode == V4SFmode && VECTOR_UNIT_ALTIVEC_P (mode)) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 4de41e78d51..f3b40229094 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2233,10 +2233,9 @@ (define_insn "*vsx_ge__p" ;; Copy sign (define_insn "vsx_copysign3" [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa") - (unspec:VSX_F -[(match_operand:VSX_F 1 "vsx_register_operand" "wa") - (match_operand:VSX_F 2 "vsx_register_operand" "wa")] -UNSPEC_COPYSIGN))] + (copysign:VSX_F +(match_operand:VSX_F 1 "vsx_register_operand" "wa") +(match_operand:VSX_F 2 "vsx_register_operand" "wa")))] "VECTOR_UNIT_VSX_P (mode)" "xvcpsgnp %x0,%x2,%x1" [(set_attr "type" "")]) -- 2.41.0 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH] PR target/111778 - Fix undefined shifts in PowerPC compiler
I was building a cross compiler to PowerPC on my x86_86 workstation with the latest version of GCC on October 11th. I could not build the compiler on the x86_64 system as it died in building libgcc. I looked into it, and I discovered the compiler was recursing until it ran out of stack space. If I build a native compiler with the same sources on a PowerPC system, it builds fine. I traced this down to a change made around October 10th: | commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD) | Author: Jiufu Guo | Date: Tue Jan 10 20:52:33 2023 +0800 | | rs6000: build constant via li/lis;rldicl/rldicr | | If a constant is possible left/right cleaned on a rotated value from | a negative value of "li/lis". Then, using "li/lis ; rldicl/rldicr" | to build the constant. The code was doing a -1 << 64 which is undefined behavior because different machines produce different results. On the x86_64 system, (-1 << 64) produces -1 while on a PowerPC 64-bit system, (-1 << 64) produces 0. The x86_64 then recurses until the stack runs out of space. If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC crosss compiler and on a native PowerPC system. Can I check this into the master branch to fix the problem? 2023-10-12 Michael Meissner gcc/ PR target/111778 * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect code from shifts that are undefined. (can_be_built_by_li_lis_and_rldicr): Likewise. (can_be_built_by_li_and_rldic): Protect code from shifts that undefined. Also replace uses of 1ULL with HOST_WIDE_INT_1U. --- gcc/config/rs6000/rs6000.cc | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 2828f01413c..cc24dd5301e 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10370,6 +10370,11 @@ can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift, /* Leading zeros may be cleaned by rldicl with a mask. Change leading zeros to ones and then recheck it. */ int lz = clz_hwi (c); + + /* If lz == 0, the left shift is undefined. */ + if (!lz) +return false; + HOST_WIDE_INT unmask_c = c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz)); int n; @@ -10398,6 +10403,11 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, int *shift, /* Tailing zeros may be cleaned by rldicr with a mask. Change tailing zeros to ones and then recheck it. */ int tz = ctz_hwi (c); + + /* If tz == HOST_BITS_PER_WIDE_INT, the left shift is undefined. */ + if (tz >= HOST_BITS_PER_WIDE_INT) +return false; + HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1); int n; if (can_be_rotated_to_lowbits (~unmask_c, 15, &n) @@ -10428,8 +10438,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask) right bits are shifted as 0's, and left 1's(and x's) are cleaned. */ int tz = ctz_hwi (c); int lz = clz_hwi (c); + + /* If lz == HOST_BITS_PER_WIDE_INT, the left shift is undefined. */ + if (lz >= HOST_BITS_PER_WIDE_INT) +return false; + int middle_ones = clz_hwi (~(c << lz)); - if (tz + lz + middle_ones >= ones) + if (tz + lz + middle_ones >= ones + && (tz - lz) < HOST_BITS_PER_WIDE_INT + && tz < HOST_BITS_PER_WIDE_INT) { *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz; *shift = tz; @@ -10440,7 +10457,8 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask) int leading_ones = clz_hwi (~c); int tailing_ones = ctz_hwi (~c); int middle_zeros = ctz_hwi (c >> tailing_ones); - if (leading_ones + tailing_ones + middle_zeros >= ones) + if (leading_ones + tailing_ones + middle_zeros >= ones + && middle_zeros < HOST_BITS_PER_WIDE_INT) { *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones); *shift = tailing_ones + middle_zeros; @@ -10450,10 +10468,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask) /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */ /* Get the position for the first bit of successive 1. The 24th bit would be in successive 0 or 1. */ - HOST_WIDE_INT low_mask = (1LL << 24) - 1LL; + HOST_WIDE_INT low_mask = (HOST_WIDE_INT_1U << 24) - HOST_WIDE_INT_1U; int pos_first_1 = ((c & (low_mask + 1)) == 0) ? clz_hwi (c & low_mask) : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask)); + + /* Make sure the left and right shifts are defined. */ + if (!IN_RANGE (pos_first_1, 1, HOST_BITS_PER_WIDE_INT-1)) +return false; + middle_ones = clz_hwi (~c << pos_first_
[PATCH] Power10: Add options to disable load and store vector pair.
In working on some future patches that involve utilizing vector pair instructions, I wanted to be able to tune my program to enable or disable using the vector pair load or store operations while still keeping the other operations on the vector pair. This patch adds two undocumented tuning options. The -mno-load-vector-pair option would tell GCC to generate two load vector instructions instead of a single load vector pair. The -mno-store-vector-pair option would tell GCC to generate two store vector instructions instead of a single store vector pair. If either -mno-load-vector-pair is used, GCC will not generate the indexed stxvpx instruction. Similarly if -mno-store-vector-pair is used, GCC will not generate the indexed lxvpx instruction. The reason for this is to enable splitting the {,p}lxvp or {,p}stxvp instructions after reload without needing a scratch GPR register. The default for -mcpu=power10 is that both load vector pair and store vector pair are enabled. I decided that if the user explicitly used the __builtin_vsx_lxvp or the __builtin_vsx_stxvp built-in functions to load or store a vector pair, that those functions would always generate a vector pair instruction. I added code so that the user code can modify these settings using either a '#pragma GCC target' directive or used __attribute__((__target__(...))) in the function declaration. I added tests for the switches, #pragma, and attribute options. I have built this on both little endian power10 systems and big endian power9 systems doing the normal bootstrap and test. There were no regressions in any of the tests, and the new tests passed. Can I check this patch into the master branch? 2023-10-13 Michael Meissner gcc/ * config/rs6000/mma.md (movoo): Add support for -mload-vector-pair and -mstore-vector-pair. * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Likewise. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.md (rs6000_setup_reg_addr_masks): If either load vector pair or store vector pair instructions are not being generated, don't allow lxvpx or stxvpx to be generated. (rs6000_option_override_internal): Add warnings if either -mload-vector-pair or -mstore-vector-pair is used without having MMA instructions. (rs6000_opt_masks): Allow user to override -mload-vector-pair or -mstore-vector-pair via #pragma or attribute. * config/rs6000/rs6000.opt (-mload-vector-pair): New option. (-mstore-vector-pair): Likewise. gcc/testsuite/ * gcc.target/powerpc/vector-pair-attribute.c: New test. * gcc.target/powerpc/vector-pair-pragma.c: New test. * gcc.target/powerpc/vector-pair-switch1.c: New test. * gcc.target/powerpc/vector-pair-switch2.c: New test. * gcc.target/powerpc/vector-pair-switch3.c: New test. * gcc.target/powerpc/vector-pair-switch4.c: New test. --- gcc/config/rs6000/mma.md | 44 +++ gcc/config/rs6000/rs6000-builtin.cc | 46 +--- gcc/config/rs6000/rs6000-builtins.def | 6 ++ gcc/config/rs6000/rs6000-cpus.def | 8 ++- gcc/config/rs6000/rs6000.cc | 30 +- gcc/config/rs6000/rs6000.opt | 8 +++ .../powerpc/vector-pair-attribute.c | 39 + .../gcc.target/powerpc/vector-pair-builtin.c | 40 ++ .../gcc.target/powerpc/vector-pair-pragma.c | 55 +++ .../gcc.target/powerpc/vector-pair-switch1.c | 16 ++ .../gcc.target/powerpc/vector-pair-switch2.c | 17 ++ .../gcc.target/powerpc/vector-pair-switch3.c | 17 ++ .../gcc.target/powerpc/vector-pair-switch4.c | 17 ++ 13 files changed, 331 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-builtin.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 575751d477e..fc7e95bc167 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,6 +91,7 @@ (define_c_enum "unspec" UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC + UNSPEC_MMA_VECTOR_PAIR_MEMORY ]) (define_c_enum "unspecv" @@ -298,6 +299,49 @@ (define_insn_and_split "*movoo" "TARGET_MMA && (gpc_reg_operand (operands[0], OOmode) || gpc_reg_operand (operands[1], OOmode))" +{ + if (MEM_P (operands[0])) +return TARGET_STORE_VECTOR_PAIR ? &qu
[PATCH 0/6] PowerPC Future patches
This patch is very preliminary support for a potential new feature to the PowerPC that extends the current power10 MMA architecture. This feature may or may not be present in any specific future PowerPC processor. In the current MMA subsystem for Power10, there are 8 512-bit accumulator registers. These accumulators are each tied to sets of 4 FPR registers. When you issue a prime instruction, it makes sure the accumulator is a copy of the 4 FPR registers the accumulator is tied to. When you issue a deprime instruction, it makes sure that the accumulator data content is logically copied to the matching FPR register. In the potential dense math system, the accumulators are moved to separate registers called dense math registers (DM registers or DMR). The DMRs are then extended to 1,024 bits and new instructions will be added to deal with all 1,024 bits of the DMRs. If you take existing MMA code, it will work as long as you don't do anything with accumulators, and you follow the rules in the ISA 3.1 documentation for using the MMA subsystem. These patches add support for the 512-bit accumulators within the dense math system, and for allocation of the 1,024-bit DMRs. At this time, no additional built-in functions will be done to support any dense math features other than doing data movement between the DMRs and the VSX registers. Before we can look at adding any new dense math support other than data movement, we need the GCC compiler to be able to allocate and use these DMRs. There are 6 patches in this patch set: 1) The first patch just adds -mcpu=future as an option to add new support. This is similar to the -mcpu=future that we did before power10 was announced. 2) The second patch enables GCC to use the load and store vector pair instructions to optimize memory copy operations in the compiler. For power10, we needed to just stay with normal vector load/stores for memory copy operations. 3) The third patch enables 512-bit accumulators that are located within in DMRs instead of the FPRs. This patch enables the register allocation, but it does not move the existing MMA to use these registers. 4) The fourth patch switches the MMA subsystem to use 512-bit accumulators within DMRs if you use -mcpu=future. 5) The fifth patch switches the names of the MMA instructions to use the dense math equivalent name if -mcpu=future. 6) The sixth patch enables using the full 1,024-bit DMRs. Right now, all you can do with DMRs is move a VSX register to a DMR register, and to move a DMR register to a VSX register. In terms of changes, these patch now use the wD constraint for accumulators. If you compile with -mcpu=power10, the wD constraint will match the equivalent FPR register that overlaps with the accumulator. If you compile with -mcpu=future, the wD constraint will match the DMR register and not the FPR register. These patches also modifies the print_operand %A output modifier to print out DMR register numbers if -mcpu=future, and continue to print out the FPR register number divided by 4 for -mcpu=power10. In general, if you only use the built-in functions, things work between the two systems. If you use extended asm, you will likely need to modify the code. Going forward, hopefully if you modify your code to use the wD constraint and %A output modifier, you can write code that switches more easily between the two systems. Again, these are preliminary patches for a potential future machine. Things will likely change in terms of implementation and usage over time. Originally these patches were submitted in November 2022: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH 1/6] PowerPC: Add -mcpu=future option
This patch implements support for a potential future PowerPC cpu. Features added with -mcpu=future, may or may not be added to new PowerPC processors. This patch adds support for the -mcpu=future option. If you use -mcpu=future, the macro __ARCH_PWR_FUTURE__ is defined, and the assembler .machine directive "future" is used. Future patches in this series will add support for new instructions that may be present in future PowerPC processors. This particular patch does not any new features. It exists as a ground work for future patches to support for a possible PowerPC processor in the future. This patch does not implement any differences in tuning when -mcpu=future is used compared to -mcpu=power10. If -mcpu=future is used, GCC will use power10 tuning. If you explicitly use -mtune=future, you will get a warning that -mtune=future is not supported, and default tuning will be set for power10. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2023-10-18 Michael Meissner gcc/ * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define __ARCH_PWR_FUTURE__ if -mcpu=future. * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro. (POWERPC_MASKS): Add -mcpu=future support. * config/rs6000/rs6000-opts.h (enum processor_type): Add PROCESSOR_FUTURE. * config/rs6000/rs6000-tables.opt: Regenerate. * config/rs6000/rs6000.cc (rs600_cpu_index_lookup): New helper function. (rs6000_option_override_internal): Make -mcpu=future set -mtune=power10. If the user explicitly uses -mtune=future, give a warning and reset the tuning to power10. (rs6000_option_override_internal): Use power10 costs for future machine. (rs6000_machine_from_flags): Add support for -mcpu=future. (rs6000_opt_masks): Likewise. * config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise. * config/rs6000/rs6000.md (cpu attribute): Likewise. * config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch. * doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future. --- gcc/config/rs6000/rs6000-c.cc | 2 + gcc/config/rs6000/rs6000-cpus.def | 6 +++ gcc/config/rs6000/rs6000-opts.h | 4 +- gcc/config/rs6000/rs6000-tables.opt | 3 ++ gcc/config/rs6000/rs6000.cc | 58 - gcc/config/rs6000/rs6000.h | 1 + gcc/config/rs6000/rs6000.md | 2 +- gcc/config/rs6000/rs6000.opt| 4 ++ gcc/doc/invoke.texi | 2 +- 9 files changed, 69 insertions(+), 13 deletions(-) diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 65be0ac43e2..e276c20cccd 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9"); if ((flags & OPTION_MASK_POWER10) != 0) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10"); + if ((flags & OPTION_MASK_FUTURE) != 0) +rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE"); if ((flags & OPTION_MASK_SOFT_FLOAT) != 0) rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT"); if ((flags & OPTION_MASK_RECIP_PRECISION) != 0) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 8c530a22da8..a6d9d7bf9a8 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -88,6 +88,10 @@ | OPTION_MASK_POWER10 \ | OTHER_POWER10_MASKS) +/* Flags for a potential future processor that may or may not be delivered. */ +#define ISA_FUTURE_MASKS (ISA_3_1_MASKS_SERVER \ +| OPTION_MASK_FUTURE) + /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ | OPTION_MASK_P9_MINMAX) @@ -134,6 +138,7 @@ | OPTION_MASK_FPRND\ | OPTION_MASK_POWER10 \ | OPTION_MASK_P10_FUSION \ +| OPTION_MASK_FUTURE \ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ | OPTION_MASK_LOAD_VECTOR_PAIR \ @@ -267,3 +272,4 @@ RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, OPTION_MASK_PPC_GFXOPT RS6000_CPU ("powerpc64le", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER | OPTION_M
[PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.
This patch re-enables generating load and store vector pair instructions when doing certain memory copy operations when -mcpu=future is used. During power10 development, it was determined that using store vector pair instructions were problematical in a few cases, so we disabled generating load and store vector pair instructions for memory options by default. This patch re-enables generating these instructions if -mcpu=future is used. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2023-10-18 Michael Meissner gcc/ * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mblock-ops-vector-pair. (POWERPC_MASKS): Likewise. --- gcc/config/rs6000/rs6000-cpus.def | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index a6d9d7bf9a8..849af6b3ac8 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -90,6 +90,7 @@ /* Flags for a potential future processor that may or may not be delivered. */ #define ISA_FUTURE_MASKS (ISA_3_1_MASKS_SERVER \ +| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\ | OPTION_MASK_FUTURE) /* Flags that need to be turned off if -mno-power9-vector. */ @@ -127,6 +128,7 @@ /* Mask of all options to set the default isa flags based on -mcpu=. */ #define POWERPC_MASKS (OPTION_MASK_ALTIVEC\ +| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\ | OPTION_MASK_CMPB \ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DFP \ -- 2.41.0 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
The MMA subsystem added the notion of accumulator registers as an optional feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with the traditional floating point registers 0..31, but logically the accumulator registers were separate from the FPR registers. In ISA 3.1, it was anticipated that in future systems, the accumulator registers may no overlap with the FPR registers. This patch adds the support for dense math registers as separate registers. This particular patch does not change the MMA support to use the accumulators within the dense math registers. This patch just adds the basic support for having separate DMRs. The next patch will switch the MMA support to use the accumulators if -mcpu=future is used. For testing purposes, I added an undocumented option '-mdense-math' to enable or disable the dense math support. This patch adds a new constraint (wD). If MMA is selected but dense math is not selected (i.e. -mcpu=power10), the wD constraint will allow access to accumulators that overlap with the VSX vector registers 0..31. If both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint will only allow dense math registers. This patch modifies the existing %A output modifier. If MMA is selected but dense math is not selected, then %A output modifier converts the VSX register number to the accumulator number, by dividing it by 4. If both MMA and dense math are selected, then %A will map the separate DMR registers into 0..7. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. It is possible that the mangling for DMRs and the GDB register numbers may change in the future. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2023-10-18 Michael Meissner gcc/ * config/rs6000/constraints.md (wD constraint): New constraint. * config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec. (movxo): Convert into define_expand. (movxo_vsx): Version of movxo where accumulators overlap with VSX vector registers 0..31. (movxo_dm): Verson of movxo that supports separate dense math accumulators. (mma_assemble_acc): Add dense math support to define_expand. (mma_assemble_acc_vsx): Rename from mma_assemble_acc, and restrict it to non dense math systems. (mma_assemble_acc_dm): Dense math version of mma_assemble_acc. (mma_disassemble_acc): Add dense math support to define_expand. (mma_disassemble_acc_vsx): Rename from mma_disassemble_acc, and restrict it to non dense math systems. (mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc. * config/rs6000/predicates.md (dmr_operand): New predicate. (accumulator_operand): Likewise. * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD constraint. (reload_reg_map): Likewise. (rs6000_reg_names): Likewise. (alt_reg_names): Likewise. (rs6000_hard_regno_nregs_internal): Likewise. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_option_override_internal): Add checking for -mdense-math. (rs6000_secondary_reload_memory): Add support for DMR registers. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (print_operand): Make %A handle both FPRs and DMRs. (rs6000_dmr_register_move_cost): New helper function. (rs6000_register_move_cost): Add support for DMR registers. (rs6000_memory_move_cost): Likewise. (rs6000_compute_pressure_classes): Likewise. (rs6000
[PATCH 4/6] PowerPC: Make MMA insns support DMR registers.
This patch changes the MMA instructions to use either FPR registers (-mcpu=power10) or DMRs (-mcpu=future). In this patch, the existing MMA instruction names are used. A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2023-10-18 Michael Meissner gcc/ * config/rs6000/mma.md (mma_): New define_expand to handle mma_ for dense math and non dense math. (mma_ insn): Restrict to non dense math. (mma_xxsetaccz): Convert to define_expand to handle non dense math and dense math. (mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non dense math. (mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz. (mma_): Add support for dense math. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define __PPC_DMR__ if we have dense math instructions. * config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if dense math and only FPRs if not dense math. (rs6000_split_multireg_move): Do not generate the xxmtacc instruction to prime the DMR registers or the xxmfacc instruction to de-prime instructions if we have dense math register support. --- gcc/config/rs6000/mma.md | 247 +- gcc/config/rs6000/rs6000-c.cc | 3 + gcc/config/rs6000/rs6000.cc | 35 ++--- 3 files changed, 176 insertions(+), 109 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index d2c5b73fa8f..e5589d8eccc 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -596,190 +596,249 @@ (define_insn "*mma_disassemble_acc_dm" "dmxxextfdmr256 %0,%1,2" [(set_attr "type" "mma")]) -(define_insn "mma_" +;; MMA instructions that do not use their accumulators as an input, still must +;; not allow their vector operands to overlap the registers used by the +;; accumulator. We enforce this by marking the output as early clobber. If we +;; have dense math, we don't need the whole prime/de-prime action, so just make +;; thse instructions be NOPs. + +(define_expand "mma_" + [(set (match_operand:XO 0 "register_operand") + (unspec:XO [(match_operand:XO 1 "register_operand")] + MMA_ACC))] + "TARGET_MMA" +{ + if (TARGET_DENSE_MATH) +{ + if (!rtx_equal_p (operands[0], operands[1])) + emit_move_insn (operands[0], operands[1]); + DONE; +} + + /* Generate the prime/de-prime code. */ +}) + +(define_insn "*mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] MMA_ACC))] - "TARGET_MMA" + "TARGET_MMA && !TARGET_DENSE_MATH" " %A0" [(set_attr "type" "mma")]) ;; We can't have integer constants in XOmode so we wrap this in an -;; UNSPEC_VOLATILE. +;; UNSPEC_VOLATILE for the non-dense math case. For dense math, we don't need +;; to disable optimization and we can do a normal UNSPEC. -(define_insn "mma_xxsetaccz" - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") +(define_expand "mma_xxsetaccz" + [(set (match_operand:XO 0 "register_operand") (unspec_volatile:XO [(const_int 0)] UNSPECV_MMA_XXSETACCZ))] "TARGET_MMA" +{ + if (TARGET_DENSE_MATH) +{ + emit_insn (gen_mma_xxsetaccz_dm (operands[0])); + DONE; +} +}) + +(define_insn "*mma_xxsetaccz_vsx" + [(set (match_operand:XO 0 "fpr_reg_operand" "=d") + (unspec_volatile:XO [(const_int 0)] + UNSPECV_MMA_XXSETACCZ))] + "TARGET_MMA && !TARGET_DENSE_MATH" "xxsetaccz %A0" [(set_attr "type" "mma")]) + +(define_insn "mma_xxsetaccz_dm" + [(set (match_operand:XO 0 "dmr_operand" "=wD") + (unspec:XO [(const_int 0)] + UNSPECV_MMA_XXSETACCZ))] + "TARGET_DENSE_MATH" + "dmsetdmrz %0" + [(set_attr "type" "mma")]) + (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") -
[PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.
This patch changes the assembler instruction names for MMA instructions from the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2023-10-18 Michael Meissner gcc/ * config/rs6000/mma.md (vvi4i4i8_dm): New int attribute. (avvi4i4i8_dm): Likewise. (vvi4i4i2_dm): Likewise. (avvi4i4i2_dm): Likewise. (vvi4i4_dm): Likewise. (avvi4i4_dm): Likewise. (pvi4i2_dm): Likewise. (apvi4i2_dm): Likewise. (vvi4i4i4_dm): Likewise. (avvi4i4i4_dm): Likewise. (mma_): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-double-test.c: New test. * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New target test. --- gcc/config/rs6000/mma.md | 98 +++-- .../gcc.target/powerpc/dm-double-test.c | 194 ++ gcc/testsuite/lib/target-supports.exp | 19 ++ 3 files changed, 299 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index e5589d8eccc..cae407bc37c 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -228,13 +228,22 @@ (define_int_attr apv [(UNSPEC_MMA_XVF64GERPP "xvf64gerpp") (define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) +(define_int_attr vvi4i4i8_dm [(UNSPEC_MMA_PMXVI4GER8 "pmdmxvi4ger8")]) + (define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")]) +(define_int_attr avvi4i4i8_dm [(UNSPEC_MMA_PMXVI4GER8PP "pmdmxvi4ger8pp")]) + (define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2") (UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s") (UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2") (UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")]) +(define_int_attr vvi4i4i2_dm [(UNSPEC_MMA_PMXVI16GER2"pmdmxvi16ger2") +(UNSPEC_MMA_PMXVI16GER2S "pmdmxvi16ger2s") +(UNSPEC_MMA_PMXVF16GER2"pmdmxvf16ger2") +(UNSPEC_MMA_PMXVBF16GER2 "pmdmxvbf16ger2")]) + (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") (UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp") (UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp") @@ -246,25 +255,54 @@ (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") (UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np") (UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")]) +(define_int_attr avvi4i4i2_dm [(UNSPEC_MMA_PMXVI16GER2PP "pmdmxvi16ger2pp") +(UNSPEC_MMA_PMXVI16GER2SPP "pmdmxvi16ger2spp") +(UNSPEC_MMA_PMXVF16GER2PP "pmdmxvf16ger2pp") +(UNSPEC_MMA_PMXVF16GER2PN "pmdmxvf16ger2pn") +(UNSPEC_MMA_PMXVF16GER2NP "pmdmxvf16ger2np") +(UNSPEC_MMA_PMXVF16GER2NN "pmdmxvf16ger2nn") +(UNSPEC_MMA_PMXVBF16GER2PP "pmdmxvbf16ger2pp") +(UNSPEC_MMA_PMXVBF16GER2PN "pmdmxvbf16ger2pn") +(UNSPEC_MMA_PMXVBF16GER2NP "pmdmxvbf16ger2np") +(UNSPEC_MMA_PMXVBF16GER2NN "pmdmxvbf16ger2nn")]) + (define_int_attr vvi4i4[(UNSPEC_MMA_PMXVF32GER "pmxvf32ger")]) +(define_int_attr vvi4i4_dm [(UNSPEC_MMA_PMXVF32GER "pmdmxvf32ger")]) + (define_int_attr avvi4i4 [(UNSPEC_MMA_PMXVF32GERPP "pmxvf32gerpp") (UNSPEC_MMA_PMXVF32GERPN "pmxvf32gerpn"
[PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.
This patch is a prelimianry patch to add the full 1,024 bit dense math register (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the DMR register. This patch only adds the new 1,024 bit register support. It does not add support for any instructions that need 1,024 bit registers instead of 512 bit registers. I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit registers. The 'wD' constraint added in previous patches is used for these registers. I added support to do load and store of DMRs via the VSX registers, since there are no load/store dense math instructions. I added the new keyword '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2023-10-18 Michael Meissner gcc/ * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. (UNSPEC_DM_INSERT512_LOWER): Likewise. (UNSPEC_DM_EXTRACT512): Likewise. (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. (movtdo): New define_expand and define_insn_and_split to implement 1,024 bit DMR registers. (movtdo_insert512_upper): New insn. (movtdo_insert512_lower): Likewise. (movtdo_extract512): Likewise. (reload_dmr_from_memory): Likewise. (reload_dmr_to_memory): Likewise. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR support. (rs6000_init_builtins): Add support for __dmr keyword. * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support for TDOmode. (rs6000_function_arg): Likewise. * config/rs6000/rs6000-modes.def (TDOmode): New mode. * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add support for TDOmode. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_hard_regno_mode_ok): Likewise. (rs6000_modes_tieable_p): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload hooks for DMR mode. (reg_offset_addressing_ok_p): Add support for TDOmode. (rs6000_emit_move): Likewise. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_secondary_reload_class): Likewise. (rs6000_mangle_type): Add mangling for __dmr type. (rs6000_dmr_register_move_cost): Add support for TDOmode. (rs6000_split_multireg_move): Likewise. (rs6000_invalid_conversion): Likewise. * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. (enum rs6000_builtin_type_index): Add DMR type nodes. (dmr_type_node): Likewise. (ptr_dmr_type_node): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-1024bit.c: New test. --- gcc/config/rs6000/mma.md | 152 ++ gcc/config/rs6000/rs6000-builtin.cc | 13 ++ gcc/config/rs6000/rs6000-call.cc | 13 +- gcc/config/rs6000/rs6000-modes.def| 4 + gcc/config/rs6000/rs6000.cc | 135 gcc/config/rs6000/rs6000.h| 7 +- gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 63 7 files changed, 351 insertions(+), 36 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index cae407bc37c..0a89db8af99 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -93,6 +93,11 @@ (define_c_enum "unspec" UNSPEC_MMA_XXMTACC UNSPEC_MMA_VECTOR_PAIR_MEMORY UNSPEC_DM_ASSEMBLE_ACC + UNSPEC_DM_INSERT512_UPPER + UNSPEC_DM_INSERT512_LOWER + UNSPEC_DM_EXTRACT512 + UNSPEC_DMR_RELOAD_FROM_MEMORY + UNSPEC_DMR_RELOAD_TO_MEMORY ]) (define_c_enum "unspecv" @@ -916,3 +921,150 @@ (define_insn "mma_" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) + + +;; TDOmode (i.e. __dmr). +(define_expand "movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand") + (match_operand:TDO 1 "input_operand"))] + "TARGET_DENSE_MATH" +{ + rs6000_emit_move (operands[0], operands[1], TDOmode); + DONE; +}) + +(define_insn_and_split "*movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa") + (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))] + "TARGET_DENSE_MATH + && (gpc_reg_operand (operands[0], TDOmode) + || gpc_reg_operand (operands[1]
[PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee
This patch adds a simple check of whether the GLIBC should be capable of switching the long double format on the PowerPC to IEEE 128-bit floating point. At the moment, library work is not yet finished, but I'm assuming that the patches will be in place when GLIBC 2.28 is released. If it turns out that the finished support does not make it until 2.29, we can adjust the patch later. Right now, if you use standard GLIBC 2.27 or earlier (ignoring the bits that actually use long double that will need to be handled), you will not be able to build libstdc++-v3 when long double is configured to be IEEE 128-bit due to errors with overloaded functions like issignalling (where both __float128 and long double versions are defined). The GLIBC team has a fix for this, and it should appear in 2.28. This patch checks whether the GLIBC version is 2.28 before allowing you to switch the long double type. Because the work to prepare GLIBC for the switch is being done using an Advance Toolchain framework, the patch allows an Advance Toolchain 2.27 with the --with-advance-toolchain configuration option (the official AT 11 release uses GLIBC 2.26 as a framework, and when completed the AT 12 release should use GLIBC 2.28). I have checked it on a little endian power8 system, building both toolchains using IBM long double and IEEE long double configurations. The tests that depend on the library support for long double that failed before still fail. I also did IEEE long double builds using the host GLIBC and that AT 11, and verified that once GCC is configured it generates an error. I built bootstrap compilers on a big endian system, and verified if I selected IEEE long double, it would fail, since I currently don't have a big endian GLIBC with the fixes installed. Can I check this in the trunk at on the GCC 8 branch? 2018-07-05 Michael Meissner * configure.ac (powerpc64*-*-linux*): Combine big and little endian checks for the long double format. Add checks to make sure the GLIBC can handle configuration of long double to be IEEE 128-bit before building GCC. * configure: Regenerate. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797 Index: gcc/configure.ac === --- gcc/configure.ac(revision 262443) +++ gcc/configure.ac(working copy) @@ -6031,23 +6031,48 @@ AC_ARG_WITH([long-double-format], [AS_HELP_STRING([--with-long-double-format={ieee,ibm}] [Specify whether PowerPC long double uses IEEE or IBM format])],[ case "$target:$with_long_double_format" in - powerpc64le-*-linux*:ieee | powerpc64le-*-linux*:ibm) -: -;; - powerpc64-*-linux*:ieee | powerpc64-*-linux*:ibm) -# IEEE 128-bit emulation is only built on 64-bit VSX Linux systems -case "$with_cpu" in - power7 | power8 | power9 | power1*) + powerpc64le-*-linux*:ibm | powerpc64-*-linux*:ibm | \ + powerpc64le-*-linux*:ieee | powerpc64-*-linux*:ieee) +# IEEE 128-bit emulation is only built on 64-bit VSX Linux systems. +# Little endian 64-bit systems are always VSX, but big endian systems +# might default to power4. +case "$target:$with_cpu" in + powerpc64le-* | *:power7 | *:power8 | *:power9 | *:power1*) : ;; *) AC_MSG_ERROR([Configuration option --with-long-double-format is only \ supported if the default cpu is power7 or newer]) with_long_double_format="" - ;; - esac - ;; - xpowerpc64*-*-linux*:*) +esac + +if test "x$with_long_double_format" = xieee; then + # See if we have a new enough GLIBC to allow using IEEE 128-bit long + # double. We assume the public 2.28 GLIBC and the development version of + # the Advance Toolchain (2.27) have all of the missing bits. + ieee_minor="28" + glibc_ieee="no" + atoolchain="" + if test "x$with_advance_toolchain" != x \ +-a -d "/opt/$with_advance_toolchain/." \ +-a -d "/opt/$with_advance_toolchain/bin/." \ +-a -d "/opt/$with_advance_toolchain/include/."; then + + ieee_minor="27" + atoolchain="Advance Toolchain " + fi + GCC_GLIBC_VERSION_GTE_IFELSE([2], [$ieee_minor], [glibc_ieee=yes], ) + if test "x$glibc_ieee" = xyes; then + echo "${atoolchain}GLIBC appears to have IEEE long double support" 1>&2 + + else + AC_MSG_ERROR([Configuration option --with-long-double-format=ieee \ +needs ${atoolchain}GLIBC 2.${ieee_minor} or newer]) + with_long_double_format="" + fi +fi +;; + powerpc64*-*-linux*:*) AC_MSG_ERROR([--with-long-double-format argument should be ibm or ieee]) with_long_double
Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee
On Fri, Jul 06, 2018 at 06:38:55AM -0500, Segher Boessenkool wrote: > On Fri, Jul 06, 2018 at 01:51:37AM -0400, Michael Meissner wrote: > > case "$target:$with_long_double_format" in > > > - xpowerpc64*-*-linux*:*) > > So this case could never happen. The changelog should mention it fixes > that bug (and having it as a separate patch is much preferred!) I assume what happened is I accidently added the 'x' to the working copy after submitting the patch, but before committing it and I didn't notice it. Since it is in configuration support, it isn't part of the test sutie, and it wasn't noticed. I can add a line to the ChangeLog if desired. > Other than this thing, the original code was easier to read. What does > this part of the patch improve? You complained that you were getting errors when using the system glibc (based on 2.27 on an Ubuntu system) and using --with-long-double-format=ieee (where it would die in the middle of building libstdc++-v3). I wrote the patch to check that the glibc has the support so it fails when configuring the compiler and gives a sensible message (need glibc 2.28). But it doesn't really change anything. If you have an appropriate glibc, it will build without the patch, and if you don't, it will fail. But it should be friendlier to people building the compiler to understand why it failed. I could duplicate the tests for glibc 2.28 (and AT-next alpha) for big endian and little endian if desired, but it seemed clearer to me to combine the code rather than duplicate the tests. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee
On Fri, Jul 06, 2018 at 10:16:34AM -0300, Tulio Magno Quites Machado Filho wrote: > I suggest to test with the following program: > > #include > > int > main () > { > return !isinfl(__builtin_infl()); > } > > Build it with: > gcc -mabi=ieeelongdouble -fno-builtin -Wno-psabi -lm test-ldbl.c > > If the execution of the program returns 0, your math library supports IEEE > long > double. Thanks, but I suspect that it won't work for building cross compilers or for building where the compiler built uses the Advance Toolchain libraries and shared library loader instead of the system versions using the configuration option --with-advance-toolchain=atx.y. The issue is you need to test whether the target GLIBC has the support when configuring the compiler, but if you are building for a cross target, you can't run the resulting binary. Even on a native system, with options like --with-advance-toolchain and --with-sysroot, the libraries used by the host compiler used to build stage1 of GCC might be different from the libraries used to build the target compiler (or stage2/stage3 in a bootstrap native build). So I used the GLIBC version tests that were already part of the GCC configuration. If there is a simple method that works for cross compilers or where a specified sysroot is used, it would be simpler than having version checks. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], Remove undocumented -mtoc-fusion from PowerPC
Back in the days when I was developing the extended fusion support for PowerPC (-mpower9-fusion), I added a partially implemented option called toc fusion. The idea was to recognize TOC entries (that normally get split into HIGH/LO_SUM pairs) early on, and keep the pairs together. Unfortunately, I messed the setting, and you could not actually use -mtoc-fusion without also setting -mcmodel=medium, since the TOC fusion tests in rs6000.c occured before the default code model was set in SUBSUBTARGET_OPTIONS. However, I stopped doing fusion work to do other things (basic power9 enablement and IEEE 128-bit floating point). While it would be simple to move the tests for TOC fusion to after the location where the code model is set, I'm thinking that the current code is rather limited. Right now, toc fusion replaces each TOC reference with a new insn that has the scratch register as a clobber. However, if you have multiple references to the same variable (such as doing the ++/-- operators) in a basic block or referencs to variables whose location near to the variable you previously referenced, we will generate multiple ADDIS operations. I have ideas how to a better job of fusion for current and future machines using a machine dependent pass to do fusion optimizations within a basic block. This means rather than keeping the toc fusion around (that nobody used), I would prefer to delete the current code, and replace it with better code as I implement it. I have tested this on a power8 little endian system with a bootstrap build and with make check. There were no regressions. In addition, I built the full spec 2006 CPU benchmark suite for power9 to make sure I didn't accidently delete insns that are used for -mpower9-fusion. Can I check this into the trunk? I don't anticipate that we will need a backport to the FSF GCC 8 branch. 2018-07-13 Michael Meissner * config/rs6000/constraints.md (wG constraint): Delete, no longer used. * config/rs6000/predicates.md (p9_fusion_reg_operand): Rename predicate to reflect toc fusion has been deleted. (toc_fusion_mem_raw): Delete, no longer used. (toc_fusion_mem_wrapped): Likewise. * config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Delete toc fusion mask bit. * config/rs6000/rs6000-protos.h (fusion_wrap_memory_address): Delete, no longer used. * config/rs6000/rs6000.c (struct rs6000_reg_addr): Delete fields meant to be used for toc fusion. (rs6000_debug_print_mode): Delete toc fusion debugging. (rs6000_debug_reg_global): Likewise. (rs6000_init_hard_regno_mode_ok): Delete setting up fields for toc fusion and secondary reload support that were never used. (rs6000_option_override_internal): Delete TOC fusion, that was only partially defined, and it did not work unless you also used the -mcmodel= switch. (rs6000_legitimate_address_p): Delete TOC fusion support. (rs6000_opt_masks): Likewise. (fusion_wrap_memory_address): Delete function, no longer used. (fusion_split_address); Delete TOC fusion support. * config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): Delete, no longer used with toc fusion being deleted. (TARGET_TOC_FUSION_FP): Likewise. * config/rs6000/rs6000.md (UNSPEC_FUSION_ADDIS): Delete TOC fusion UNSPEC. (toc fusion spliter): Delete TOC fusion support. (toc_fusionload_): Likewise. (toc_fusionload_di): Likewise. (fusion_gpr_load_): Delete generator function, this insn no longer needs to be named. Rename predicate to delete TOC fusion. (fusion_gpr___load): Likewise. (fusion_gpr___store): Likewise. (fusion_vsx___load): Likewise. (fusion_vsx___store): Likewise. (p9 fusion peephole2s): Rename predicate to delete TOC fusion. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 262647) +++ gcc/config/rs6000/constraints.md(working copy) @@ -157,10 +157,8 @@ (define_memory_constraint "wF" "Memory operand suitable for power9 fusion load/stores" (match_operand 0 "fusion_addis_mem_combo_load")) -;; Fusion gpr load. -(define_memory_constraint "wG" - "Memory operand suitable for TOC fusion memory references" - (match_operand 0 "toc_fusion_mem_wrapped")) +;; wG is now available. Previously it was a memory operand suitable for TOC +;; fusion. (define_register_constraint "wH" "rs6000_constraints[RS6000_CONSTRAINT_wH]" "Altivec register to hold 32-bit integers or NO_REGS.") Index: gcc/config/rs6000/predicates.m
Re: [PATCH], Remove undocumented -mtoc-fusion from PowerPC
On Wed, Jul 18, 2018 at 05:59:50PM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Fri, Jul 13, 2018 at 04:56:13PM -0400, Michael Meissner wrote: > > This means rather than keeping the toc fusion around (that nobody used), I > > would prefer to delete the current code, and replace it with better code as > > I > > implement it. > > > > +++ gcc/config/rs6000/constraints.md(working copy) > > > +;; wG is now available. Previously it was a memory operand suitable for > > TOC > > +;; fusion. > > There are many other constraints unused. Keep track of all, instead? > Like we have (at the top of this file) > ;; Available constraint letters: e k q t u A B C D S T > you could do something similar for the "w" names. I just deleted the comment, and reworded the other comment. Here is the changes I committed: 2018-07-27 Michael Meissner * config/rs6000/constraints.md (wG constraint): Delete, no longer used. * config/rs6000/predicates.md (p9_fusion_reg_operand): Rename predicate to reflect toc fusion has been deleted. (toc_fusion_mem_raw): Delete, no longer used. (toc_fusion_mem_wrapped): Likewise. * config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Delete toc fusion mask bit. * config/rs6000/rs6000-protos.h (fusion_wrap_memory_address): Delete, no longer used. * config/rs6000/rs6000.c (struct rs6000_reg_addr): Delete fields meant to be used for toc fusion. (rs6000_debug_print_mode): Delete toc fusion debugging. (rs6000_debug_reg_global): Likewise. (rs6000_init_hard_regno_mode_ok): Delete setting up fields for toc fusion and secondary reload support that were never used. (rs6000_option_override_internal): Delete TOC fusion, that was only partially defined, and it did not work unless you also used the -mcmodel= switch. (rs6000_legitimate_address_p): Delete TOC fusion support. (rs6000_opt_masks): Likewise. (fusion_wrap_memory_address): Delete function, no longer used. (fusion_split_address); Delete TOC fusion support. * config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): Delete, no longer used with toc fusion being deleted. (TARGET_TOC_FUSION_FP): Likewise. * config/rs6000/rs6000.md (UNSPEC_FUSION_ADDIS): Delete TOC fusion UNSPEC. (toc fusion spliter): Delete TOC fusion support. (toc_fusionload_): Likewise. (toc_fusionload_di): Likewise. (fusion_gpr_load_): Delete generator function, this insn no longer needs to be named. Rename predicate to delete TOC fusion. (fusion_gpr___load): Likewise. (fusion_gpr___store): Likewise. (fusion_vsx___load): Likewise. (fusion_vsx___store): Likewise. (p9 fusion peephole2s): Rename predicate to delete TOC fusion. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 263034) +++ gcc/config/rs6000/constraints.md(working copy) @@ -157,11 +157,6 @@ (define_memory_constraint "wF" "Memory operand suitable for power9 fusion load/stores" (match_operand 0 "fusion_addis_mem_combo_load")) -;; Fusion gpr load. -(define_memory_constraint "wG" - "Memory operand suitable for TOC fusion memory references" - (match_operand 0 "toc_fusion_mem_wrapped")) - (define_register_constraint "wH" "rs6000_constraints[RS6000_CONSTRAINT_wH]" "Altivec register to hold 32-bit integers or NO_REGS.") Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 263034) +++ gcc/config/rs6000/predicates.md (working copy) @@ -406,13 +406,11 @@ (define_predicate "fpr_reg_operand" return FP_REGNO_P (r); }) -;; Return true if this is a register that can has D-form addressing (GPR and -;; traditional FPR registers for scalars). ISA 3.0 (power9) adds D-form -;; addressing for scalars in Altivec registers. -;; -;; If this is a pseudo only allow for GPR fusion in power8. If we have the -;; power9 fusion allow the floating point types. -(define_predicate "toc_fusion_or_p9_reg_operand" +;; Return true if this is a register that can has D-form addressing (GPR, +;; traditional FPR registers, and Altivec registers for scalars). Unlike +;; power8 fusion, this fusion does not depend on putting the ADDIS instruction +;; into the GPR register being loaded. +(define_predicate "p9_fusion_reg_operand" (match_code "reg,subreg") { HOS
[PATCH], Improve PowerPC switch behavior on medium code model system
I noticed that the switch code on PowerPC little endian systems (with medium code mode) did not follow the ABI in terms of page 69: Table 2.36. Position-Independent Switch Code for Small/Medium Models (preferred, with TOC-relative addressing) The code we currently generate is: .section".toc","aw" .align 3 .LC0: .quad .L4 .section".text" # ... addis 10,2,.LC0@toc@ha ld 10,.LC0@toc@l(10) sldi 3,3,2 add 9,10,3 lwa 9,0(9) add 9,9,10 mtctr 9 bctr .L4: .long .L2-.L4 .long .L12-.L4 .long .L11-.L4 .long .L10-.L4 .long .L9-.L4 .long .L8-.L4 .long .L7-.L4 .long .L6-.L4 .long .L5-.L4 .long .L3-.L4 While the suggested code would be something like: addis 10,2,.L4@toc@ha addi 10,10,.L4@toc@l sldi 3,3,2 lwax 9,10,3 add 9,9,10 mtctr 9 bctr .p2align 2 .align 2 .L4: .long .L2-.L4 .long .L12-.L4 .long .L11-.L4 .long .L10-.L4 .long .L9-.L4 .long .L8-.L4 .long .L7-.L4 .long .L6-.L4 .long .L5-.L4 .long .L3-.L4 This patch adds an insn to load a LABEL_REF into a GPR. This is needed so the FWPROP1 pass can convert the load the of the label address from the TOC to a direct load to a GPR. While working on the patch, I discovered that the LWA instruction did not support indexed loads. This was due to it using the 'Y' constraint, which accepts DS-form offsettable addresses, but not X-form indexed addresses. I added the Z constraint so that the indexed form is accepted. I am in the middle of doing spec 2006 runs on both power8 and power9 systems with this change. So far after 2 runs out 3, I'm seeing several minor wins on power9 (1-2%, perlbench, gcc, sjeng, sphinx3) and no regressions. On power8 I see 3 minor wins (1-3%, perlbench, sjeng, omnetpp) and 1 minor regression (1%, povray). I have done bootstrap builds with/without the change and there were no regressions in the test suite. Can I check this change into the trunk? It is a simple enough change for back ports, if desired. Note, I will be on vacation for 11 days starting this Saturday. I will not be actively checking my mail in that time period. If I get the approval early enough, I can check it in. Otherwise, somebody else can check it in if they monitor for failure, or we can wait until I get around August 14th to check it in. 2018-07-31 Michael Meissner * config/rs6000/predicates.md (label_ref_operand): New predicate to recognize LABEL_REF. * config/rs6000/rs6000.c (rs6000_output_addr_const_extra): Allow LABEL_REF's inside of UNSPEC_TOCREL's. * config/rs6000/rs6000.md (extendsi2): Allow reg+reg indexed addressing. (labelref): New insn to optimize loading a label address into registers on a medium code system. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 263040) +++ gcc/config/rs6000/predicates.md (working copy) @@ -1662,6 +1662,10 @@ (define_predicate "small_toc_ref" return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL; }) +;; Match a LABEL_REF operand +(define_predicate "label_ref_operand" + (match_code "label_ref")) + ;; Match the first insn (addis) in fusing the combination of addis and loads to ;; GPR registers on power8. (define_predicate "fusion_gpr_addis" Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 263040) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -20807,7 +20807,8 @@ rs6000_output_addr_const_extra (FILE *fi switch (XINT (x, 1)) { case UNSPEC_TOCREL: - gcc_checking_assert (GET_CODE (XVECEXP (x, 0, 0)) == SYMBOL_REF + gcc_checking_assert ((GET_CODE (XVECEXP (x, 0, 0)) == SYMBOL_REF + || GET_CODE (XVECEXP (x, 0, 0)) == LABEL_REF) && REG_P (XVECEXP (x, 0, 1)) && REGNO (XVECEXP (x, 0, 1)) == TOC_REGISTER); output_addr_const (file, XVECEXP (x, 0, 0)); Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 263040) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -998,7 +998,7 @@ (define_insn "extendsi2" "=r, r, wl,wu,wj,wK, wH,wr&quo
Ping: [PATCH] Power10: Add options to disable load and store vector pair.
Ping patch: | Date: Fri, 13 Oct 2023 19:41:13 -0400 | From: Michael Meissner | Subject: [PATCH] Power10: Add options to disable load and store vector pair. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632987.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping: [PATCH 1/6] PowerPC: Add -mcpu=future option
Ping patch. | Date: Wed, 18 Oct 2023 19:58:56 -0400 | From: Michael Meissner | Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.
Ping patch. | Date: Wed, 18 Oct 2023 20:00:18 -0400 | From: Michael Meissner | Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633512.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
Ping patch: | ate: Wed, 18 Oct 2023 20:01:54 -0400 | From: Michael Meissner | Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633513.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.
Ping patch. | Date: Wed, 18 Oct 2023 20:03:02 -0400 | From: Michael Meissner | Subject: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633514.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.
Ping patch. | Date: Wed, 18 Oct 2023 20:04:44 -0400 | From: Michael Meissner | Subject: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633515.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.
Ping patch. | Date: Wed, 18 Oct 2023 20:06:20 -0400 | From: Michael Meissner | Subject: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633516.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping #2: [PATCH] Power10: Add options to disable load and store vector pair.
Ping #2 | Date: Fri, 13 Oct 2023 19:41:13 -0400 | From: Michael Meissner | Subject: [PATCH] Power10: Add options to disable load and store vector pair. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632987.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping #2: [PATCH 1/6] PowerPC: Add -mcpu=future option
Ping #2 | Date: Wed, 18 Oct 2023 19:58:56 -0400 | From: Michael Meissner | Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping #2: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.
Ping #2 | Date: Wed, 18 Oct 2023 20:00:18 -0400 | From: Michael Meissner | Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633512.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping #2: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
Ping #2 | Date: Wed, 18 Oct 2023 20:01:54 -0400 | From: Michael Meissner | Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633514.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping #2: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.
Ping #2 | Date: Wed, 18 Oct 2023 20:04:44 -0400 | From: Michael Meissner | Subject: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633515.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping #2: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.
Ping #2 | Date: Wed, 18 Oct 2023 20:06:20 -0400 | From: Michael Meissner | Subject: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers. | Message-ID: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633516.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH] V6, #1 of 17: Use ADJUST_INSN_LENGTH for prefixed instructions
On Tue, Oct 22, 2019 at 05:27:19PM -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Oct 16, 2019 at 09:35:33AM -0400, Michael Meissner wrote: > > This patch uses the target hook ADJUST_INSN_LENGTH to change the length of > > instructions that contain prefixed memory/add instructions. > > That made this amazingly hard to review. But it might well be worth it, > thankfully :-) > > > There are 2 new insn attributes: > > > > 1) num_insns: If non-zero, returns the number of machine instructions in an > > insn. This simplifies the calculations in rs6000_insn_cost. > > This is great. > > > 2) max_prefixed_insns: Returns the maximum number of prefixed instructions > > in > > an insn. Normally this is 1, but in the insns that load up 128-bit values > > into > > GPRs, it will be 2. > > This one, I am not so sure. I wanted it to be simple, so in general it was just a constant. Since the only user of it has already checked that the insn is prefixed, I didn't think it needed the prefixed test to set it to 0. > > - int n = get_attr_length (insn) / 4; > > + /* If the insn tells us how many insns there are, use that. Otherwise > > use > > + the length/4. Adjust the insn length to remove the extra size that > > + prefixed instructions take. */ > > This should be temporary, until we have converted everything to use > num_insns, right? Well there were some 200+ places where length was set. > > --- gcc/config/rs6000/rs6000.h (revision 277017) > > +++ gcc/config/rs6000/rs6000.h (working copy) > > @@ -1847,9 +1847,30 @@ extern scalar_int_mode rs6000_pmode; > > /* Adjust the length of an INSN. LENGTH is the currently-computed length > > and > > should be adjusted to reflect any required changes. This macro is used > > when > > there is some systematic length adjustment required that would be > > difficult > > - to express in the length attribute. */ > > + to express in the length attribute. > > > > -/* #define ADJUST_INSN_LENGTH(X,LENGTH) */ > > + In the PowerPC, we use this to adjust the length of an instruction if > > one or > > + more prefixed instructions are generated, using the attribute > > + num_prefixed_insns. A prefixed instruction is 8 bytes instead of 4, > > but the > > + hardware requires that a prefied instruciton not cross a 64-byte > > boundary. > > "prefixed instruction does not" Thanks. > > + This means the compiler has to assume the length of the first prefixed > > + instruction is 12 bytes instead of 8 bytes. Since the length is > > already set > > + for the non-prefixed instruction, we just need to udpate for the > > + difference. */ > > + > > +#define ADJUST_INSN_LENGTH(INSN,LENGTH) > > \ > > +{ \ > > + if (NONJUMP_INSN_P (INSN)) > > \ > > +{ > > \ > > + rtx pattern = PATTERN (INSN); > > \ > > + if (GET_CODE (pattern) != USE && GET_CODE (pattern) != CLOBBER > > \ > > + && get_attr_prefixed (INSN) == PREFIXED_YES) \ > > + { \ > > + int num_prefixed = get_attr_max_prefixed_insns (INSN);\ > > + (LENGTH) += 4 * (num_prefixed + 1); \ > > + } \ > > +} > > \ > > +} > > Please use a function, not a function-like macro. Ok, I added rs6000_adjust_insn_length in rs6000.c. > So this computes the *maximum* RTL instruction length, not considering how > many of the machine insns in it need a prefix insn. Can't we do better? > Hrm, I guess in all cases that matter we will split early anyway. Well before register allocation for the 128-bit types, you really can't say what the precise length is, even if it is not prefixed. And of course even after register allocation, it isn't precise, since the length of a prefixed instruction is normally 8, but sometimes 12. So we have to use 12. > > > +;; Return the number of real hardware instructions in a combined insn. If > > it > > +;; is 0, just use the length / 4. > > +(define_attr "num_insns" "" (const_int 0)) > > S
Re: [PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns
On Fri, Nov 01, 2019 at 10:22:03PM -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Oct 16, 2019 at 09:47:41AM -0400, Michael Meissner wrote: > > This patch fixes the stack protection insns to support stacks larger than > > 16-bits on the 'future' system using prefixed loads and stores. > > > +;; We can't use the prefixed attribute here because there are two memory > > +;; instructions. We can't split the insn due to the fact that this > > operation > > +;; needs to be done in one piece. > > (define_insn "stack_protect_setdi" > >[(set (match_operand:DI 0 "memory_operand" "=Y") > > (unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET)) > > (set (match_scratch:DI 2 "=&r") (const_int 0))] > >"TARGET_64BIT" > > - "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0" > > +{ > > + if (prefixed_memory (operands[1], DImode)) > > +output_asm_insn ("pld %2,%1", operands); > > + else > > +output_asm_insn ("ld%U1%X1 %2,%1", operands); > > + > > + if (prefixed_memory (operands[0], DImode)) > > +output_asm_insn ("pstd %2,%0", operands); > > + else > > +output_asm_insn ("std%U0%X0 %2,%0", operands); > > We could make %pN mean 'p' for prefixed, for memory as operands[N]? Are > there more places than this that could use that? How about inline asm? Right now, the only two places that do this are the two stack protect insns. Everything else that I'm aware of that generates multiple loads or stores will do a split before final. > > + (set (attr "length") > > + (cond [(and (match_operand 0 "prefixed_memory") > > + (match_operand 1 "prefixed_memory")) > > + (const_string "24") > > + > > + (ior (match_operand 0 "prefixed_memory") > > + (match_operand 1 "prefixed_memory")) > > + (const_string "20")] > > + > > + (const_string "12")))]) > > You can use const_int instead of const_string here, I think? Please do > that if it works. I'll try it out on Monday. > Quite a simple expression, phew :-) > > > + if (which_alternative == 0) > > +output_asm_insn ("xor. %3,%3,%4", operands); > > + else > > +output_asm_insn ("cmpld %0,%3,%4\;li %3,0", operands); > > That doesn't work: the backslash is treated like the escape character, in > a C block. I think doubling it will work? Check the generated insn-output.c, > it should be translated to \t\n in there. Yes it does work. I just checked. > Okay for trunk with those things taken care of. Thanks! -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns
On Fri, Nov 01, 2019 at 10:22:03PM -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Oct 16, 2019 at 09:47:41AM -0400, Michael Meissner wrote: > > This patch fixes the stack protection insns to support stacks larger than > > 16-bits on the 'future' system using prefixed loads and stores. > > > +;; We can't use the prefixed attribute here because there are two memory > > +;; instructions. We can't split the insn due to the fact that this > > operation > > +;; needs to be done in one piece. > > (define_insn "stack_protect_setdi" > >[(set (match_operand:DI 0 "memory_operand" "=Y") > > (unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET)) > > (set (match_scratch:DI 2 "=&r") (const_int 0))] > >"TARGET_64BIT" > > - "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0" > > +{ > > + if (prefixed_memory (operands[1], DImode)) > > +output_asm_insn ("pld %2,%1", operands); > > + else > > +output_asm_insn ("ld%U1%X1 %2,%1", operands); > > + > > + if (prefixed_memory (operands[0], DImode)) > > +output_asm_insn ("pstd %2,%0", operands); > > + else > > +output_asm_insn ("std%U0%X0 %2,%0", operands); > > We could make %pN mean 'p' for prefixed, for memory as operands[N]? Are > there more places than this that could use that? How about inline asm? At the moment, I did not add this. We can revisit it later. > > + (set (attr "length") > > + (cond [(and (match_operand 0 "prefixed_memory") > > + (match_operand 1 "prefixed_memory")) > > + (const_string "24") > > + > > + (ior (match_operand 0 "prefixed_memory") > > + (match_operand 1 "prefixed_memory")) > > + (const_string "20")] > > + > > + (const_string "12")))]) > > You can use const_int instead of const_string here, I think? Please do > that if it works. > > Quite a simple expression, phew :-) Const_int works. > > + if (which_alternative == 0) > > +output_asm_insn ("xor. %3,%3,%4", operands); > > + else > > +output_asm_insn ("cmpld %0,%3,%4\;li %3,0", operands); > > That doesn't work: the backslash is treated like the escape character, in > a C block. I think doubling it will work? Check the generated insn-output.c, > it should be translated to \t\n in there. > > Okay for trunk with those things taken care of. Thanks! As we discussed, this does work. Here is the patch committed. I did a bootstrap and did make check. There were no regressions. 2019-11-11 Michael Meissner * config/rs6000/predicates.md (prefixed_memory): New predicate. * config/rs6000/rs6000.md (stack_protect_setdi): Deal with either address being a prefixed load/store. (stack_protect_testdi): Deal with either address being a prefixed load. Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 278062) +++ gcc/config/rs6000/predicates.md (working copy) @@ -1828,3 +1828,10 @@ (define_predicate "pcrel_external_addres (define_predicate "pcrel_local_or_external_address" (ior (match_operand 0 "pcrel_local_address") (match_operand 0 "pcrel_external_address"))) + +;; Return true if the operand is a memory address that uses a prefixed address. +(define_predicate "prefixed_memory" + (match_code "mem") +{ + return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); +}) Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 278062) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -11536,14 +11536,44 @@ (define_insn "stack_protect_setsi" [(set_attr "type" "three") (set_attr "length" "12")]) +;; We can't use the prefixed attribute here because there are two memory +;; instructions. We can't split the insn due to the fact that this operation +;; needs to be done in one piece. (define_insn "stack_protect_setdi" [(set (match_operand:DI 0 "memory_operand" "=Y") (unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET)) (set (match_scratch:DI 2 "=&r") (const_int 0))] "TARGET_64BIT" - "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0" +{ + if (prefixed_memory (operands[
PowerPC -mcpu=future Version 12 patches
This is version 12 of my patches for PowerPC -mcpu=future. There are currently 14 patches. Note, the PCREL_OPT patches are not part of this series. I want to concentrate on getting the other patches checked in. Patches #1-4 reflect changes that were asked for in the previous (V11) set of patches for patches V11 #2-#5. Patch #5 is the same patch as V11 patch #6 that switches the default for -mpcrel when the user uses -mcpu=future. Patch #6 is the same patch as V11 patch #7 that adds new options for the target-supports testcases. The remaining patches (#7-14) are the same tests that were in V11 as patches #8-15. I have built these patches on a little endian power8 system and there were no regressions in the test suite. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #1 of 14, add gcc_asserts for rs6000_adjust_vec_address
In https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01530.html, Segher asked me to do the gcc_asserts as early as possible. This patch makes sure the base register temporary is not used in the other arguments. I have built and bootstrapped a compiler on a little endian power8 system, and there were no regressions in the test. In addition, I compiled both Spec 2006 and Spec 2017 benchmarks with this compiler and I saw new build failures. Can I check this into the trunk? 2020-01-09 Michael Meissner * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add some gcc_asserts. --- /tmp/kXfaUP_rs6000.c2020-01-08 13:59:48.664454496 -0500 +++ gcc/config/rs6000/rs6000.c 2020-01-08 13:59:45.593410764 -0500 @@ -6772,6 +6772,9 @@ rs6000_adjust_vec_address (rtx scalar_re rtx new_addr; bool valid_addr_p; + gcc_assert (!reg_mentioned_p (base_tmp, addr)); + gcc_assert (!reg_mentioned_p (base_tmp, element)); + /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY. */ gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC); @@ -6781,6 +6784,10 @@ rs6000_adjust_vec_address (rtx scalar_re element_offset = GEN_INT (INTVAL (element) * scalar_size); else { + /* All insns should use the 'Q' constraint (address is a single register) +if the element number is not a constant. */ + gcc_assert (REG_P (addr) || SUBREG_P (addr)); + int byte_shift = exact_log2 (scalar_size); gcc_assert (byte_shift >= 0); -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #2 of 14, Refactor rs6000_adjust_vec_address & rs6000_split_vec_extract_var
In https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01530.html, Seghar had some questions about that patch. This patch addresses some of those concerns. Instead of limiting the vector element number in rs6000_split_vec_extract_var so that the memory access does not go out of bounds, I decided to move the logic to rs6000_adjust_vec_address. Rs6000_split_vec_extract_var is the only caller of rs6000_adjust_vec_address that passes in a variable element number. The function rs6000_adjust_vec_address has 3 parts: 1) Calculation of the byte offset within the vector; 2) Creation of the new vector address; 3) Validating that the new address is valid for the register being loaded. In this patch, I moved the code that calculates the byte offset to a separate function, and moved in the AND that was originally done in rs6000_split_vec_extract_var. I have built and bootstrapped a compiler with this patch installed on a little endian power8 system and there were no regressions in the test suite. In addition, I built -mcpu=future versions of Spec 2006 and Spec 2017, and there were no additional failures. Can I check this patch into the trunk? 2020-01-09 Michael Meissner * config/rs6000/rs6000.c (get_vector_offset): New helper function to calculate the offset in memory from the start of a vector of a particular element. Add code to keep the element number in bounces if the element number is variable. (rs6000_adjust_vec_address): Move calculation of offset of the vector element to get_vector_offset. (rs6000_split_vec_extract_var): Do not do the initial AND of element here, move the code to get_vector_offset. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 280071) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6753,6 +6753,43 @@ hard_reg_and_mode_to_addr_mask (rtx reg, return addr_mask; } +/* Return the offset within a memory object (MEM) of a vector type to a given + element within the vector (ELEMENT) with an element size (SCALAR_SIZE). If + the element is constant, we return a constant integer. Otherwise, we use a + base register temporary to calculate the offset after making it to fit + within the vector and scaling it. */ + +static rtx +get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size) +{ + if (CONST_INT_P (element)) +return GEN_INT (INTVAL (element) * scalar_size); + + /* All insns should use the 'Q' constraint (address is a single register) if + the element number is not a constant. */ + rtx addr = XEXP (mem, 0); + gcc_assert (REG_P (addr) || SUBREG_P (addr)); + + /* Mask the element to make sure the element number is between 0 and the + maximum number of elements - 1 so that we don't generate an address + outside the vector. */ + rtx num_ele_m1 = GEN_INT (GET_MODE_NUNITS (GET_MODE (mem)) - 1); + rtx and_op = gen_rtx_AND (Pmode, element, num_ele_m1); + emit_insn (gen_rtx_SET (base_tmp, and_op)); + + /* Shift the element to get the byte offset from the element number. */ + int shift = exact_log2 (scalar_size); + gcc_assert (shift >= 0); + + if (shift > 0) +{ + rtx shift_op = gen_rtx_ASHIFT (Pmode, base_tmp, GEN_INT (shift)); + emit_insn (gen_rtx_SET (base_tmp, shift_op)); +} + + return base_tmp; +} + /* Adjust a memory address (MEM) of a vector type to point to a scalar field within the vector (ELEMENT) with a mode (SCALAR_MODE). Use a base register temporary (BASE_TMP) to fixup the address. Return the new memory address @@ -6767,7 +6804,6 @@ rs6000_adjust_vec_address (rtx scalar_re { unsigned scalar_size = GET_MODE_SIZE (scalar_mode); rtx addr = XEXP (mem, 0); - rtx element_offset; rtx new_addr; bool valid_addr_p; @@ -6779,30 +6815,7 @@ rs6000_adjust_vec_address (rtx scalar_re /* Calculate what we need to add to the address to get the element address. */ - if (CONST_INT_P (element)) -element_offset = GEN_INT (INTVAL (element) * scalar_size); - else -{ - /* All insns should use the 'Q' constraint (address is a single register) -if the element number is not a constant. */ - gcc_assert (REG_P (addr) || SUBREG_P (addr)); - - int byte_shift = exact_log2 (scalar_size); - gcc_assert (byte_shift >= 0); - - if (byte_shift == 0) - element_offset = element; - - else - { - if (TARGET_POWERPC64) - emit_insn (gen_ashldi3 (base_tmp, element, GEN_INT (byte_shift))); - else - emit_insn (gen_ashlsi3 (base_tmp, element, GEN_INT (byte_shift))); - - element_offset = base_tmp; - } -} + rtx element_offset = get_vector_offset (mem, element, base_tmp, scalar_size); /* Create the new address pointing to the element within the vector. If we are adding 0, we don't have
[PATCH] V12 patch #3 of 14, Improve address validation in rs6000_adjust_vec_address
In the patches: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01530.html https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01533.html Segher said the whole code was too complex. This patch is my attempt to make it somewhat easier to understand. One part that is an issue was there was a section of code to tried to prevent doing an ADDI if the register was GPR 0 (where the machine uses '0' instead of the value in GPR 0). I realized that if I changed the order of the adds, I wouldn't have to worry about adding GPR 0. For example consider: #include double indexed_get1 (vector double *vp, unsigned long m) { return vec_extract (vp[m], 1); } Right now it generates: sldi 4,4,4 addi 9,3,8 lfdx 1,4,9 I.e. add the offset to the base register and then form a X-FORM load with the base and index registers. With this patch, it now generates: sldi 4,4,4 add 9,4,3 lfd 1,8(9) I.e. add the base and index registers to the temporary, and a D-FORM load (assuming the element number is constant) instead of a X-FORM load with the offset as the index. The second part of cleaning up the code was to eliminate the special purpose code that checks the addr_masks for the register type along with the code that assumed all 8-byte values needed a DS-FORM instruction. Instead I now call address_to_insn_form, which is the general address classification function added recently. That function peers into the addr_masks, etc. but it means this function at a higher abstraction layer doens't have to worry about the details. This patch does eliminate the hard_reg_and_mode_to_addr_mask function that I added recently in anticipation of using to optimize PC-relative addresses as well. When I started looking at it, I figured it simplified things if I could push all of the details to address_to_insn_form (which already knew about these things). As with the other patches, I have built and boostrapped a compiler on a little endian power8 system, and there were no regressions in the tests. Can I check this patch into the trunk? 2020-01-09 Michael Meissner * config/rs6000/rs6000.c (reg_to_non_prefixed): Add forward reference. (hard_reg_and_mode_to_addr_mask): Delete, no longer used. (rs6000_adjust_vec_address): If the original vector address was REG+REG or REG+OFFSET and the element is not zero, do the add of the elements in the original address before adding the offset for the vector element. Use address_to_insn_form to validate the address using the register being loaded, rather than guessing whether the address is a DS-FORM or DQ-FORM address. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 280072) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1172,6 +1172,7 @@ static bool rs6000_secondary_reload_move machine_mode, secondary_reload_info *, bool); +static enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode); rtl_opt_pass *make_pass_analyze_swaps (gcc::context*); /* Hash table stuff for keeping track of TOC entries. */ @@ -6729,30 +6730,6 @@ rs6000_expand_vector_extract (rtx target } } -/* Helper function to return an address mask based on a physical register. */ - -static addr_mask_type -hard_reg_and_mode_to_addr_mask (rtx reg, machine_mode mode) -{ - unsigned int r = reg_or_subregno (reg); - addr_mask_type addr_mask; - - gcc_assert (HARD_REGISTER_NUM_P (r)); - if (INT_REGNO_P (r)) -addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR]; - - else if (FP_REGNO_P (r)) -addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR]; - - else if (ALTIVEC_REGNO_P (r)) -addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX]; - - else -gcc_unreachable (); - - return addr_mask; -} - /* Return the offset within a memory object (MEM) of a vector type to a given element within the vector (ELEMENT) with an element size (SCALAR_SIZE). If the element is constant, we return a constant integer. Otherwise, we use a @@ -6805,7 +6782,6 @@ rs6000_adjust_vec_address (rtx scalar_re unsigned scalar_size = GET_MODE_SIZE (scalar_mode); rtx addr = XEXP (mem, 0); rtx new_addr; - bool valid_addr_p; gcc_assert (!reg_mentioned_p (base_tmp, addr)); gcc_assert (!reg_mentioned_p (base_tmp, element)); @@ -6833,68 +6809,30 @@ rs6000_adjust_vec_address (rtx scalar_re { rtx op0 = XEXP (addr, 0); rtx op1 = XEXP (addr, 1); - rtx insn; gcc_assert (REG_P (op0) || SUBREG_P (op0)); if (CONST_INT_P (op1) && CONST_INT_P (element_offset)) { + /* D-FORM address with constant element number. */ HOST_WIDE_INT of
[PATCH] V12 patch #4 of 14, Optimize adjusting PC-relative vector addresses
This patch folds a PC-relative vector address that is adjusted with a constant offset, to fold the constant into the PC-relative address. I moved this code to be a separate function to make it clearer what the steps were. With patch V12 #3, address_to_insn_form is now used to validate the address, so we don't need any new special address validation. I have build and bootstrapped a compiler on a little endian power8 system, and there were no regressions in the test suite. Can I check this in to the trunk. Patch V12 #13 will contain new tests for this optimization. 2020-01-09 Michael Meissner * config/rs6000/rs6000.c (adjust_vec_address_pcrel): New helper function to adjust PC-relative vector addresses. (rs6000_adjust_vec_address): Call adjust_vec_address_pcrel to handle vectors with PC-relative addresses. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 280073) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6767,6 +6767,60 @@ get_vector_offset (rtx mem, rtx element, return base_tmp; } +/* Helper function update PC-relative addresses when we are adjusting a memory + address (ADDR) to a vector to point to a scalar field within the vector with + a constant offset (ELEMENT_OFFSET). If the address is not valid, we can + use the base register temporary (BASE_TMP) to form the address. */ + +static rtx +adjust_vec_address_pcrel (rtx addr, rtx element_offset, rtx base_tmp) +{ + rtx new_addr = NULL; + + gcc_assert (CONST_INT_P (element_offset)); + + if (GET_CODE (addr) == CONST) +addr = XEXP (addr, 0); + + if (GET_CODE (addr) == PLUS) +{ + rtx op0 = XEXP (addr, 0); + rtx op1 = XEXP (addr, 1); + + if (CONST_INT_P (op1)) + { + HOST_WIDE_INT offset + = INTVAL (XEXP (addr, 1)) + INTVAL (element_offset); + + if (offset == 0) + new_addr = op0; + + else + { + rtx plus = gen_rtx_PLUS (Pmode, op0, GEN_INT (offset)); + new_addr = gen_rtx_CONST (Pmode, plus); + } + } + + else + { + emit_move_insn (base_tmp, addr); + new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); + } +} + + else if (SYMBOL_REF_P (addr) || LABEL_REF_P (addr)) +{ + rtx plus = gen_rtx_PLUS (Pmode, addr, element_offset); + new_addr = gen_rtx_CONST (Pmode, plus); +} + + else +gcc_unreachable (); + + return new_addr; +} + /* Adjust a memory address (MEM) of a vector type to point to a scalar field within the vector (ELEMENT) with a mode (SCALAR_MODE). Use a base register temporary (BASE_TMP) to fixup the address. Return the new memory address @@ -6803,6 +6857,11 @@ rs6000_adjust_vec_address (rtx scalar_re else if (REG_P (addr) || SUBREG_P (addr)) new_addr = gen_rtx_PLUS (Pmode, addr, element_offset); + /* For references to local static variables, fold a constant offset into the + address. */ + else if (pcrel_local_address (addr, Pmode) && CONST_INT_P (element_offset)) +new_addr = adjust_vec_address_pcrel (addr, element_offset, base_tmp); + /* Optimize D-FORM addresses with constant offset with a constant element, to include the element offset in the address directly. */ else if (GET_CODE (addr) == PLUS) -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #5 of 14, Make -mpcrel default for -mcpu=future on little endian Linux 64-bit systems
This patch is the same as patch V11 #6: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01494.html Assuming patches 1-4 are applied, it fixes all of the known codegen bugs with the -mcpu=future support, and so it is time to make -mpcrel default on the one system that will support PC-relative addressing on the future system. I have built and bootstrapped a compiler with this patch on a little endian power8 system, and there were no regressions in the testsuite. I have built Spec 2006 and Spec 2017 benchmarks with this patch, and there were no regressions in building the benchmarks. Can I check this patch into the trunk? 2020-01-09 Michael Meissner * config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to 1 to enable prefixed addressing if -mcpu=future. (PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing if -mcpu=future. * config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not enable -mprefixed-addr or -mpcrel by default. (ADDRESSING_FUTURE_MASKS): New macro. (OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS. * config/rs6000/rs6000.c (PREFIXED_ADDR_SUPPORTED_BY_OS): Disable prefixed addressing unless the target OS tm.h says we should enable it. (PCREL_SUPPORTED_BY_OS): Disable PC-relative addressing unless the target OS tm.h says we should enable it. (rs6000_debug_reg_global): Print whether prefixed addressing and PC-relative addressing is enabled by default if -mcpu=future. (rs6000_option_override_internal): Move setting prefixed addressing and PC-relative addressing after the sub-target option handling is done. Only enable prefixed addressing or PC-relative address on -mcpu=future system if the target OS says to enable it. Disallow prefixed addressing on 32-bit systems or if the target object file is not ELF v2. Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 280069) +++ gcc/config/rs6000/linux64.h (working copy) @@ -640,3 +640,11 @@ extern int dot_symbols; enabling the __float128 keyword. */ #undef TARGET_FLOAT128_ENABLE_TYPE #define TARGET_FLOAT128_ENABLE_TYPE 1 + +/* Enable support for pc-relative and numeric prefixed addressing on the + 'future' system. */ +#undef PREFIXED_ADDR_SUPPORTED_BY_OS +#define PREFIXED_ADDR_SUPPORTED_BY_OS 1 + +#undef PCREL_SUPPORTED_BY_OS +#define PCREL_SUPPORTED_BY_OS 1 Index: gcc/config/rs6000/rs6000-cpus.def === --- gcc/config/rs6000/rs6000-cpus.def (revision 280069) +++ gcc/config/rs6000/rs6000-cpus.def (working copy) @@ -75,15 +75,22 @@ | OPTION_MASK_P8_VECTOR\ | OPTION_MASK_P9_VECTOR) -/* Support for a future processor's features. Do not enable -mpcrel until it - is fully functional. */ +/* Support for a future processor's features. The prefixed and pc-relative + addressing bits are not added here. Instead, they are added if the target + OS tm.h says that it supports the addressing modes by default when + -mcpu=future is used. */ #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER \ -| OPTION_MASK_FUTURE \ +| OPTION_MASK_FUTURE) + +/* Addressing related flags on a future processor. These are options that need + to be cleared if the target OS is not capable of supporting prefixed + addressing at all (such as 32-bit mode or if the object file format is not + ELF v2). */ +#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL \ | OPTION_MASK_PREFIXED_ADDR) /* Flags that need to be turned off if -mno-future. */ -#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL \ -| OPTION_MASK_PREFIXED_ADDR) +#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 280074) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -98,6 +98,16 @@ #endif #endif +/* Set up the defaults for whether prefixed addressing is used, and if it is + used, whether we want to turn on pc-relative support by default. */ +#ifndef PREFIXED_ADDR_SUPPORTED_BY_OS +#define PREFIXED_ADDR_SUPPORTED_BY_OS 0 +#endif + +#ifndef PCREL_SUPPORTED_BY_OS +#define PCREL_SUPPORTED_BY_OS 0 +#endif + /* Support targetm.vectorize.builtin_mask_for_load. */ GTY(()) tree altivec_builtin_mask_for_load;
[PATCH] V12 patch #6 of 14, Add -mcpu=future target-supports options
This patch is the same as V11, #7: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01495.html This patch adds the necessary options to target-supports.exp to enable the specific target supports for -mcpu=future. It contains changes that you asked for some time ago. Can I check this into the trunk? 2020-01-09 Michael Meissner * lib/target-supports.exp (check_effective_target_powerpc_pcrel): New target for PowerPC -mcpu=future support. (check_effective_target_powerpc_prefixed_addr): New target for PowerPC -mcpu=future support. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 280069) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2161,6 +2161,23 @@ proc check_p9modulo_hw_available { } { }] } +# Return 1 if the target generates PC-relative instructions automatically +proc check_effective_target_powerpc_pcrel { } { +return [check_no_messages_and_pattern powerpc_pcrel \ + {\mpld\M.*[@]pcrel} assembly { + static long s; + long *p = &s; + long foo (void) { return s; } + } {-O2 -mcpu=future}] +} + +# Return 1 if the target generates prefixed instructions automatically +proc check_effective_target_powerpc_prefixed_addr { } { +return [check_no_messages_and_pattern powerpc_prefixed_addr \ + {\mpld\M} assembly { + long foo (long *p) { return p[0x12345]; } + } {-O2 -mcpu=future}] +} # Return 1 if the target supports executing FUTURE instructions, 0 otherwise. # Cache the result. It is assumed that if a simulator does not support the -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #7 of 14, Add PADDI/PLI tests
This patch adds new tests for the compiler generating PLI or PADDI with large constants when -mcpu=future is used. It renames the files as you requested several patch generations ago so the -fident option doesn't give a false positive result. Can I check this patch into the trunk? 2020-01-09 Michael Meissner * gcc.target/powerpc/prefix-add.c: New test for -mcpu=future generating PADDI for large constant adds. * gcc.target/powerpc/prefix-di-constant.c: New test for -mcpu=future generating PLI to load up large DImode constants. * gcc.target/powerpc/prefix-si-constant.c: New test for -mcpu=future generating PLI to load up large SImode constants. Index: gcc/testsuite/gcc.target/powerpc/prefix-add.c === --- gcc/testsuite/gcc.target/powerpc/prefix-add.c (revision 280078) +++ gcc/testsuite/gcc.target/powerpc/prefix-add.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PADDI is generated to add a large constant. */ +unsigned long +add (unsigned long a) +{ + return a + 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpaddi\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c === --- gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c (revision 280078) +++ gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant. */ +unsigned long +large (void) +{ + return 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c === --- gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c (revision 280078) +++ gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant for SImode. */ +void +large_si (unsigned int *p) +{ + *p = 0x12345U; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #8 of 14, Add test to verify prefixed instruction is generated for -mcpu=future for DS/DS illegal offsets
This patch is the same as: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01497.html It adds a test to validate that the compiler will now generate a prefixed load or store instead of loading up an offset that would be illegal for DS/DQ-FORM instructions. Can I check this into the trunk? 2020-01-09 Michael Meissner * gcc.target/powerpc/prefix-ds-dq.c: New test to verify that we generate the prefix load/store instructions for traditional instructions with an offset that doesn't match DS/DQ requirements. Index: gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c === --- gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (revision 280080) +++ gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (working copy) @@ -0,0 +1,156 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests whether we generate a prefixed load/store operation for addresses that + don't meet DS/DQ offset constraints. */ + +unsigned long +load_uc_offset1 (unsigned char *p) +{ + return p[1]; /* should generate LBZ. */ +} + +long +load_sc_offset1 (signed char *p) +{ + return p[1]; /* should generate LBZ + EXTSB. */ +} + +unsigned long +load_us_offset1 (unsigned char *p) +{ + return *(unsigned short *)(p + 1); /* should generate LHZ. */ +} + +long +load_ss_offset1 (unsigned char *p) +{ + return *(short *)(p + 1);/* should generate LHA. */ +} + +unsigned long +load_ui_offset1 (unsigned char *p) +{ + return *(unsigned int *)(p + 1); /* should generate LWZ. */ +} + +long +load_si_offset1 (unsigned char *p) +{ + return *(int *)(p + 1); /* should generate PLWA. */ +} + +unsigned long +load_ul_offset1 (unsigned char *p) +{ + return *(unsigned long *)(p + 1);/* should generate PLD. */ +} + +long +load_sl_offset1 (unsigned char *p) +{ + return *(long *)(p + 1); /* should generate PLD. */ +} + +float +load_float_offset1 (unsigned char *p) +{ + return *(float *)(p + 1);/* should generate LFS. */ +} + +double +load_double_offset1 (unsigned char *p) +{ + return *(double *)(p + 1); /* should generate LFD. */ +} + +__float128 +load_float128_offset1 (unsigned char *p) +{ + return *(__float128 *)(p + 1); /* should generate PLXV. */ +} + +void +store_uc_offset1 (unsigned char uc, unsigned char *p) +{ + p[1] = uc; /* should generate STB. */ +} + +void +store_sc_offset1 (signed char sc, signed char *p) +{ + p[1] = sc; /* should generate STB. */ +} + +void +store_us_offset1 (unsigned short us, unsigned char *p) +{ + *(unsigned short *)(p + 1) = us; /* should generate STH. */ +} + +void +store_ss_offset1 (signed short ss, unsigned char *p) +{ + *(signed short *)(p + 1) = ss; /* should generate STH. */ +} + +void +store_ui_offset1 (unsigned int ui, unsigned char *p) +{ + *(unsigned int *)(p + 1) = ui; /* should generate STW. */ +} + +void +store_si_offset1 (signed int si, unsigned char *p) +{ + *(signed int *)(p + 1) = si; /* should generate STW. */ +} + +void +store_ul_offset1 (unsigned long ul, unsigned char *p) +{ + *(unsigned long *)(p + 1) = ul; /* should generate PSTD. */ +} + +void +store_sl_offset1 (signed long sl, unsigned char *p) +{ + *(signed long *)(p + 1) = sl;/* should generate PSTD. */ +} + +void +store_float_offset1 (float f, unsigned char *p) +{ + *(float *)(p + 1) = f; /* should generate STF. */ +} + +void +store_double_offset1 (double d, unsigned char *p) +{ + *(double *)(p + 1) = d; /* should generate STD. */ +} + +void +store_float128_offset1 (__float128 f128, unsigned char *p) +{ + *(__float128 *)(p + 1) = f128; /* should generate PSTXV. */ +} + +/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlbz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlha\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlhz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlwz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mplwa\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mplxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\msth\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M} 2 } } */ -- M
[PATCH] V12 patch #9 of 14, Add test to validate we don't generate an illegal prefixed instruction
This patch is the same as: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01499.html It adds a new test to make sure if we are using a prefixed load or store instruction, the compiler does not try to use a load with update or store with update version of the isntruction, since there are no prefixed version of those instructions. Can I check this into the trunk? 2020-01-09 Michael Meissner * gcc.target/powerpc/prefix-no-premodify.c: Make sure we do not generate the non-existent PLWZU instruction if -mcpu=future. Index: gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c === --- gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c (revision 280082) +++ gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c (working copy) @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Make sure that we don't generate a prefixed form of the load and store with + update instructions (i.e. instead of generating LWZU we have to generate + PLWZ plus a PADDI). */ + +#ifndef SIZE +#define SIZE 5 +#endif + +struct foo { + unsigned int field; + char pad[SIZE]; +}; + +struct foo *inc_load (struct foo *p, unsigned int *q) +{ + *q = (++p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *dec_load (struct foo *p, unsigned int *q) +{ + *q = (--p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *inc_store (struct foo *p, unsigned int *q) +{ + (++p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +struct foo *dec_store (struct foo *p, unsigned int *q) +{ + (--p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +/* { dg-final { scan-assembler-times {\mlwz\M}2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M}2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstw\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mplwzu\M}} } */ +/* { dg-final { scan-assembler-not {\mpstwu\M}} } */ +/* { dg-final { scan-assembler-not {\maddis\M}} } */ +/* { dg-final { scan-assembler-not {\maddi\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #10 of 14, Add tests for generating prefixed load/store instructions with large numeric offsets
This patch is the same as: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01500.html This patch adds one test per type validating that we generate the appropriate prefixed instructions to load/store the type when the offset if large. Can I check this into the trunk? 2020-01-09 Michael Meissner * gcc.target/powerpc/prefix-large.h: New set of tests to test prefixed addressing on 'future' system with large numeric offsets for various types. * gcc.target/powerpc/prefix-large-dd.c: New test for prefixed loads/stores with large offsets for the _Decimal64 type. * gcc.target/powerpc/prefix-large-df.c: New test for prefixed loads/stores with large offsets for the double type. * gcc.target/powerpc/prefix-large-di.c: New test for prefixed loads/stores with large offsets for the long type. * gcc.target/powerpc/prefix-large-hi.c: New test for prefixed loads/stores with large offsets for the short type. * gcc.target/powerpc/prefix-large-kf.c: New test for prefixed loads/stores with large offsets for the __float128 type. * gcc.target/powerpc/prefix-large-qi.c: New test for prefixed loads/stores with large offsets for the signed char type. * gcc.target/powerpc/prefix-large-sd.c: New test for prefixed loads/stores with large offsets for the _Decimal32 type. * gcc.target/powerpc/prefix-large-sf.c: New test for prefixed loads/stores with large offsets for the float type. * gcc.target/powerpc/prefix-large-si.c: New test for prefixed loads/stores with large offsets for the int type. * gcc.target/powerpc/prefix-large-udi.c: New test for prefixed loads/stores with large offsets for the unsigned long type. * gcc.target/powerpc/prefix-large-uhi.c: New test for prefixed loads/stores with large offsets for the unsigned short type. * gcc.target/powerpc/prefix-large-uqi.c: New test for prefixed loads/stores with large offsets for the unsigned char type. * gcc.target/powerpc/prefix-large-usi.c: New test for prefixed loads/stores with large offsets for the unsigned int type. * gcc.target/powerpc/prefix-large-v2df.c: New test for prefixed loads/stores with large offsets for the vector double type. Index: gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c === --- gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c (revision 280083) +++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset for _Decimal64 objects. */ + +#define TYPE _Decimal64 + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-large-df.c === --- gcc/testsuite/gcc.target/powerpc/prefix-large-df.c (revision 280083) +++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset for double objects. */ + +#define TYPE double + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-large-di.c === --- gcc/testsuite/gcc.target/powerpc/prefix-large-di.c (revision 280083) +++ gcc/testsuite/gcc.target/powerpc/prefix-large-di.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset for long objects. */ + +#define TYPE long + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c === --- gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c (revision 280083) +++ gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */
[PATCH] V12 patch #11 of 14, Add tests for using PC-relative instructions with -mcpu=future
This patch is the same as: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01501.html This patch adds a set of tests for each type to verify that the appropriate PC-relative instructions are generated when -mcpu=future is used. Can I check this patch into the trunk? 2020-01-09 Michael Meissner * gcc.target/powerpc/prefix-pcrel.h: New set of tests to test prefixed addressing on 'future' system with PC-relative addresses for various types. * gcc.target/powerpc/prefix-pcrel-dd.c: New test for prefixed loads/stores with PC-relative addresses for the _Decimal64 type. * gcc.target/powerpc/prefix-pcrel-df.c: New test for prefixed loads/stores with PC-relative addresses for the double type. * gcc.target/powerpc/prefix-pcrel-di.c: New test for prefixed loads/stores with PC-relative addresses for the long type. * gcc.target/powerpc/prefix-pcrel-hi.c: New test for prefixed loads/stores with PC-relative addresses for the short type. * gcc.target/powerpc/prefix-pcrel-kf.c: New test for prefixed loads/stores with PC-relative addresses for the __float128 type. * gcc.target/powerpc/prefix-pcrel-qi.c: New test for prefixed loads/stores with PC-relative addresses for the signed char type. * gcc.target/powerpc/prefix-pcrel-sd.c: New test for prefixed loads/stores with PC-relative addresses for the _Decimal32 type. * gcc.target/powerpc/prefix-pcrel-sf.c: New test for prefixed loads/stores with PC-relative addresses for the float type. * gcc.target/powerpc/prefix-pcrel-si.c: New test for prefixed loads/stores with PC-relative addresses for the int type. * gcc.target/powerpc/prefix-pcrel-udi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned long type. * gcc.target/powerpc/prefix-pcrel-uhi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned short type. * gcc.target/powerpc/prefix-pcrel-uqi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned char type. * gcc.target/powerpc/prefix-pcrel-usi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned int type. * gcc.target/powerpc/prefix-pcrel-v2df.c: New test for prefixed loads/stores with PC-relative addresses for the vector double type. Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c === --- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c (revision 280086) +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for the _Decimal64 type. */ + +#define TYPE _Decimal64 + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {[@]pcrel} 4 } } */ +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c === --- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c (revision 280086) +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for the double type. */ + +#define TYPE double + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {[@]pcrel} 4 } } */ +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c === --- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c (revision 280086) +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for the long type. */ + +#define TYPE long + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {[@]pcrel} 4 } } */ +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c ==
[PATCH] V12 patch #12 of 14, Add test for -fstack-protect-strong with large stack sizes and -mcpu=future
This patch is the same as: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01503.html This patch adds a new test to test that -fstack-protect-strong generates the correct code when a large stack is used and the compiler option -mcpu=future is also used. Can I check this into the trunk? This is a bug that we discovered when we attempted to build glibc using the -mcpu=future option. 2020-01-09 Michael Meissner * gcc.target/powerpc/prefix-stack-protect.c: New test to make sure -fstack-protect-strong works with prefixed addressing. Index: gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c === --- gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (revision 280088) +++ gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */ + +/* Test that we can handle large stack frames with -fstack-protector-strong and + prefixed addressing. This was originally discovered in trying to build + glibc with -mcpu=future, and vfwprintf.c failed because it used + -fstack-protector-strong. */ + +extern long foo (char *); + +long +bar (void) +{ + char buffer[0x2]; + return foo (buffer) + 1; +} + +/* { dg-final { scan-assembler {\mpld\M} } } */ +/* { dg-final { scan-assembler {\mpstd\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #13 of 14, Add tests for vec_extract with PC-relative addresses
This patch is the same as: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01504.html This patch adds some tests to validate the work in patches V12 #1-4 generate the correct code with vec_extract is used with a vector with a PC-relative address and -mcpu=future is used. Can I check this into the trunk? 2020-01-09 Michael Meissner * gcc.target/powerpc/vec-extract-pcrel-si.c: New test for vec_extract from a PC-relative address. * gcc.target/powerpc/vec-extract-pcrel-di.c: New test for vec_extract from a PC-relative address. * gcc.target/powerpc/vec-extract-pcrel-sf.c: New test for vec_extract from a PC-relative address. * gcc.target/powerpc/vec-extract-pcrel-df.c: New test for vec_extract from a PC-relative address. Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (revision 280090) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V2DF vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE double +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (revision 280090) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V2DI vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE unsigned long +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (revision 280090) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V4SF vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE float +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mplfs\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (revision 280090) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V4SI vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE unsigned int +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V12 patch #14 of 14, Add tests for generating prefixed instructions when using vec_extract with large offsets with -mcpu=future
While this patch is similar in spirit to V11 #15, I lost that patch, and I re-implemented the check. Can I check this test into the trunk? 2020-01-09 Michael Meissner * gcc.target/powerpc/vec-extract-large-si.c: New test for vec_extract from a vector unsigned int in memory with a large offset. * gcc.target/powerpc/vec-extract-large-di.c: New test for vec_extract from a vector long in memory with a large offset. * gcc.target/powerpc/vec-extract-large-sf.c: New test for vec_extract from a vector float in memory with a large offset. * gcc.target/powerpc/vec-extract-large-df.c: New test for vec_extract from a vector double in memory with a large offset. Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (revision 280092) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V2DF vectors with a large numeric + offset address. */ + +#include + +#ifndef TYPE +#define TYPE double +#endif + +#ifndef OFFSET +#define OFFSET 0x12345 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 0); +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 1); +} + +TYPE +getn (vector TYPE *p, unsigned long n) +{ + return vec_extract (p[OFFSET], n); +} + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (revision 280092) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V2DI vectors with a large numeric + offset address. */ + +#include + +#ifndef TYPE +#define TYPE unsigned long long +#endif + +#ifndef OFFSET +#define OFFSET 0x12345 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 0); +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 1); +} + +TYPE +getn (vector TYPE *p, unsigned long n) +{ + return vec_extract (p[OFFSET], n); +} + +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (revision 280092) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V4SF vectors with a large numeric + offset address. */ + +#include + +#ifndef TYPE +#define TYPE float +#endif + +#ifndef OFFSET +#define OFFSET 0x12345 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 0); +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 1); +} + +TYPE +getn (vector TYPE *p, unsigned long n) +{ + return vec_extract (p[OFFSET], n); +} + +/* { dg-final { scan-assembler-times {\mplfs\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (revision 280092) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V4SI vectors with a large numeric + offset address. */ + +#include + +#ifndef TYPE +#define TYPE unsigned int +#endif + +#ifndef OFFSET +#define OFFSET 0x12345 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 0); +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[OFFSET], 1); +} + +TYPE +getn (vector TYPE *p, unsigned long n) +{ + return vec_extract (p[OFFSET], n); +} + +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V12 patch #5 of 14, Make -mpcrel default for -mcpu=future on little endian Linux 64-bit systems
On Fri, Jan 31, 2020 at 07:12:53PM -0600, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 09, 2020 at 07:40:08PM -0500, Michael Meissner wrote: > > * config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to > > 1 to enable prefixed addressing if -mcpu=future. > > (PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing > > if -mcpu=future. > > * config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not > > enable -mprefixed-addr or -mpcrel by default. > > I understand why this is needed for pcrel (or useful at least), but why > for prefixed addressing in general as well? What OS support is needed > for that? > > Put another way, is this just carefulness, or do you run into actual > problems without it? Just caution. I can just do the PCREL. > > +/* Enable support for pc-relative and numeric prefixed addressing on the > > + 'future' system. */ > > +#undef PREFIXED_ADDR_SUPPORTED_BY_OS > > +#define PREFIXED_ADDR_SUPPORTED_BY_OS 1 > > + > > +#undef PCREL_SUPPORTED_BY_OS > > +#define PCREL_SUPPORTED_BY_OS 1 > > "Numeric prefixed addressing"? What's that? Just "and other prefixed > addressing", maybe? Using a prefixed address with a large offset, or using a small offset because the traditional instruction is a DS/DQ instruction and the bottom 2/4 bits are non-zero. > (Is it useful to have those two separate at all, btw? Now, that is while > we are still developing the code, but also in the future?) > > > +/* Addressing related flags on a future processor. These are options that > > need > > + to be cleared if the target OS is not capable of supporting prefixed > > + addressing at all (such as 32-bit mode or if the object file format is > > not > > + ELF v2). */ > > Ah. If we are missing the needed relocations (or other as/ld support). > So it is not about OS really, missing toolchain support instead? It also plays into the dynamic loader of the system. If the dynamic loader doesn't support the new relocations, you can't do PCREL. > > > + /* Only ELFv2 currently supports prefixed/pcrel addressing. */ > > + else if (rs6000_current_abi != ABI_ELFv2) > > + { > > + if (TARGET_PCREL && explicit_pcrel) > > + error ("%qs requires %qs", "-mpcrel", "-mabi=elfv2"); > > + > > + else if (TARGET_PREFIXED_ADDR && explicit_prefixed) > > + error ("%qs requires %qs", "-mprefixed-addr", "-mabi=elfv2"); > > It would be good if the error messages also said "currently" somehow (it > is not an actual limitation, it's just a matter of code). "Is only > supported with -mabi=elfv2", perhaps? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V12 patch #2 of 14, Refactor rs6000_adjust_vec_address & rs6000_split_vec_extract_var
On Fri, Jan 31, 2020 at 11:30:22AM -0600, Segher Boessenkool wrote: > But why is that the correct thing to do? Garbage in, garbage out is > perfectly fine? Or do we have (e.g.) builtins that specify this masking? > If so, please say that here. It has been this way since I added these for power7 or power8, so I'm not changing the semantics here. Quoting from the LE abi: VEC_EXTRACT (ARG1, ARG2) This function uses modular arithmetic on ARG2 to determine the element number. For example, if ARG2 is out of range, the compiler uses ARG2 modulo the number of elements in the vector to determine the element position. So if we were to remove the ANDing, we would have to change the ABI. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V12 patch #3 of 14, Improve address validation in rs6000_adjust_vec_address
On Fri, Jan 31, 2020 at 05:43:20PM -0600, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 09, 2020 at 07:27:58PM -0500, Michael Meissner wrote: > > * config/rs6000/rs6000.c (reg_to_non_prefixed): Add forward > > reference. > > FWIW, it is better to just reorder the code, in most cases. > > > (hard_reg_and_mode_to_addr_mask): Delete, no longer used. > > Just "Delete.". Changelogs say what, not why; you have the commit > message for that. > > > + new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); > > So this depends on op0 not being r0 here. Do we guarantee that somehow? > It isn't obvious, so add an assert for this please? (Or do I miss > something obvious? :-) ) That particular code is inside if CONST_INT_P (op1). Therefore, op0 cannot be r0, but I can add an assertion. > > +/* If the address isn't valid, move the address into the temporary base > > + register. Some reasons it could not be valid include: > > + The address offset overflowed the 16 or 34 bit offset size; > > + We need to use a DS-FORM load, and the bottom 2 bits are non-zero; > > + We need to use a DQ-FORM load, and the bottom 2 bits are non-zero; > > + Only X_FORM loads can be done, and the address is D_FORM. */ > > 4 bits for DQ-form? > > Okay for trunk with those tweaks. Thanks! > > > Segher -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] Fix PR 93568 on PowerPC (vector extract failures)
When I submitted my recent patches, in updating one of the patches, I made a thinko that resulted in a lot of failures on big endian systems (but not as many on the little endian systems). I have done bootstraps on both big endian and little endian systems. Can I check in this patch? On a big endian power8 system, the following tests now pass: gcc.target/powerpc/pr87532-mc.c gcc.target/powerpc/pr89765-mc.c gcc.target/powerpc/vec-extract-3.c gcc.target/powerpc/vec-extract-5.c gcc.target/powerpc/vec-extract-6.c gcc.target/powerpc/vec-extract-7.c gcc.target/powerpc/vec-extract-8.c gcc.target/powerpc/vec-extract-9.c gcc.target/powerpc/vec-extract-v16qi-df.c gcc.target/powerpc/vec-extract-v16qi.c gcc.target/powerpc/vec-extract-v16qiu-df.c gcc.target/powerpc/vec-extract-v16qiu.c gcc.target/powerpc/vec-extract-v2df.c gcc.target/powerpc/vec-extract-v2di.c gcc.target/powerpc/vec-extract-v4sf.c gcc.target/powerpc/vec-extract-v4si-df.c gcc.target/powerpc/vec-extract-v4si.c gcc.target/powerpc/vec-extract-v4siu-df.c gcc.target/powerpc/vec-extract-v4siu.c gcc.target/powerpc/vec-extract-v8hi-df.c gcc.target/powerpc/vec-extract-v8hi.c gcc.target/powerpc/vec-extract-v8hiu-df.c gcc.target/powerpc/vec-extract-v8hiu.c gcc.target/powerpc/vsx-builtin-10b.c gcc.target/powerpc/vsx-builtin-11b.c gcc.target/powerpc/vsx-builtin-12b.c gcc.target/powerpc/vsx-builtin-14b.c gcc.target/powerpc/vsx-builtin-15b.c gcc.target/powerpc/vsx-builtin-16b.c gcc.target/powerpc/vsx-builtin-17b.c gcc.target/powerpc/vsx-builtin-18b.c gcc.target/powerpc/vsx-builtin-19b.c gcc.target/powerpc/vsx-builtin-9b.c On a little endian power8 system, the following tests now pass: gcc.target/powerpc/pr87532-mc.c gcc.target/powerpc/pr89765-mc.c gcc.target/powerpc/vec-extract-v2di.c gcc.target/powerpc/vsx-builtin-12b.c gcc.target/powerpc/vsx-builtin-19b.c 2020-02-05 Michael Meissner PR target/93568 * config/rs6000/rs6000.c (get_vector_offset): Fix --- /tmp/a8cqkr_rs6000.c2020-02-05 14:55:36.255021903 -0600 +++ gcc/config/rs6000/rs6000.c 2020-02-05 13:27:00.393877012 -0600 @@ -6744,8 +6744,7 @@ get_vector_offset (rtx mem, rtx element, /* All insns should use the 'Q' constraint (address is a single register) if the element number is not a constant. */ - rtx addr = XEXP (mem, 0); - gcc_assert (satisfies_constraint_Q (addr)); + gcc_assert (satisfies_constraint_Q (mem)); /* Mask the element to make sure the element number is between 0 and the maximum number of elements - 1 so that we don't generate an address -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], PR target/93569, Fix PowerPC vsx-builtin-15d.c test case
When I applied my previous patches for vec_extract, I switched to using reg_to_non_prefixed to validate the vector extract address. It uncovered a bug that reg_to_non_prefixed allowed D-FORM (reg+offset) addresses to load up Altivec registers on power7 and power8. However, those systems only supported X-FORM (reg+reg) addressing. Power9 added support for DS-FORM and DQ-FORM addressing to the Altivec registers. This patch fixes this so that the vsx-builtin-15d.c test case now passes. Can I check this into the master branch? I have done bootstrap builds and make check on both a little endian Power8 system and a big endian Power8 system. There were no regressions. On the big endian system, just vsx-builtin-15d.c now passes. On the little endian system, vsx-builtin-15d.c now passes along with some Fortran tests. 2020-02-05 Michael Meissner PR target/93569 * config/rs6000/rs6000.c (reg_to_non_prefixed): Before ISA 3.0 we only had X-FORM (reg+reg) addressing in the traditional Altivec registers. --- /tmp/eAu61F_rs6000.c2020-02-05 18:08:48.698992017 -0500 +++ gcc/config/rs6000/rs6000.c 2020-02-05 17:23:55.733650185 -0500 @@ -24943,9 +24943,13 @@ reg_to_non_prefixed (rtx reg, machine_mo } /* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, IEEE - 128-bit floating point, and 128-bit integers. */ + 128-bit floating point, and 128-bit integers. Before power9, only indexed + addressing was available. */ else if (ALTIVEC_REGNO_P (r)) { + if (!TARGET_P9_VECTOR) + return NON_PREFIXED_X; + if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode)) return NON_PREFIXED_DS; -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH], PR target/93569, Fix PowerPC vsx-builtin-15d.c test case
On Thu, Feb 06, 2020 at 09:49:18AM -0600, Segher Boessenkool wrote: > Hi! > > On Thu, Feb 06, 2020 at 08:29:41AM -0500, Michael Meissner wrote: > > --- /tmp/eAu61F_rs6000.c2020-02-05 18:08:48.698992017 -0500 > > +++ gcc/config/rs6000/rs6000.c 2020-02-05 17:23:55.733650185 -0500 > > @@ -24943,9 +24943,13 @@ reg_to_non_prefixed (rtx reg, machine_mo > > } > > > >/* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, > > IEEE > > - 128-bit floating point, and 128-bit integers. */ > > + 128-bit floating point, and 128-bit integers. Before power9, only > > indexed > > + addressing was available. */ > >else if (ALTIVEC_REGNO_P (r)) > > { > > + if (!TARGET_P9_VECTOR) > > + return NON_PREFIXED_X; > > + > >if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode)) > > return NON_PREFIXED_DS; > > That looks fine, but is this complete? What about the other VSRs? Like > right before this: > > if (FP_REGNO_P (r)) > { > if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode)) > return NON_PREFIXED_D; > > else if (size < 8) > return NON_PREFIXED_X; > > else if (TARGET_VSX && size >= 16 >&& (VECTOR_MODE_P (mode) >|| FLOAT128_VECTOR_P (mode) >|| mode == TImode || mode == CTImode)) > return NON_PREFIXED_DQ; > > else > return NON_PREFIXED_DEFAULT; > } > > If we are dealing with a SF or DF (or whatever else in a "legacy" FPR), > that is fine, but what about vectors in those regs? It says we can use > DQ-mode here, but that is only true from p9 onward, no? Good point. I'll submit a revised patch once the bootstrap and make check finishes. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] PR target/93569 [version 2], Fix PowerPC vsx-builtin-15d.c test case
This patch addresses the concern the Segher raised in the original submission of the patch to fix PR target/93569. In addition to checking for D*-form addresses in the traditional Altivec registers, this patch also checks for D*-form addresses for vectors in the traditional floating point registers. Neither one of these address forms were allowed before ISA 3.0 (power9). I have done bootstraps on both little and big endian Linux 64-bit systems, and there were no regressions for this change. Can I check this patch into the master branch? https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00387.html 2020-02-06 Michael Meissner PR target/93569 * config/rs6000/rs6000.c (reg_to_non_prefixed): Before ISA 3.0 we only had X-FORM (reg+reg) addressing for vectors. Also before ISA 3.0, we only had X-FORM addressing for scalars in the traditional Altivec registers. --- /tmp/VQDg8p_rs6000.c2020-02-06 11:55:27.509363545 -0500 +++ gcc/config/rs6000/rs6000.c 2020-02-06 11:54:28.461531334 -0500 @@ -24923,7 +24923,8 @@ reg_to_non_prefixed (rtx reg, machine_mo unsigned size = GET_MODE_SIZE (mode); /* FPR registers use D-mode for scalars, and DQ-mode for vectors, IEEE - 128-bit floating point, and 128-bit integers. */ + 128-bit floating point, and 128-bit integers. Before power9, only indexed + addressing was available for vectors. */ if (FP_REGNO_P (r)) { if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode)) @@ -24936,16 +24937,20 @@ reg_to_non_prefixed (rtx reg, machine_mo && (VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode) || mode == TImode || mode == CTImode)) - return NON_PREFIXED_DQ; + return (TARGET_P9_VECTOR) ? NON_PREFIXED_DQ : NON_PREFIXED_X; else return NON_PREFIXED_DEFAULT; } /* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, IEEE - 128-bit floating point, and 128-bit integers. */ + 128-bit floating point, and 128-bit integers. Before power9, only indexed + addressing was available. */ else if (ALTIVEC_REGNO_P (r)) { + if (!TARGET_P9_VECTOR) + return NON_PREFIXED_X; + if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode)) return NON_PREFIXED_DS; -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], Rename and document PowerPC -mprefixed-addr to -mprefixed
This patch renames the PowerPC internal switch -mprefixed-addr to be -mprefixed. Last week, Bill, Segher, and I were talking, and we came to the conclusion that we needed to make the prefixed addressing option more public. This is particularly true, when you consider that only 64-bit little endian Linux will have support for these mode. Other OSes, ABI's, etc. out there that may/may not support all of the new addressing modes in the 'future' computer. And we also decided, we preferred the simpler '-mprefixed' option over '-mprefixed-addr'. If you use -mno-prefixed, you get the current addressing modes on your system for power9 and the compiler will not generate the prefixed loads or stores. If you use -mprefixed -mno-pcrel, the compiler will generate prefixed loads and stores utilizing 34-bit offset addressing with numeric offsets that don't need relocation. It will not generate PC-relative loads and stores. If you use -mpcrel, you must be using the 64-bit ELF v2 ABI, and the code model must be medium. If you use -mpcrel, the compiler will generate PC-relative loads and stores to access items, rather than the current TOC based loads and stores. If you use -mpcrel, it implies -mprefixed. If you use -mno-prefixed, you cannot use -mpcrel. With the exception of making the switch a public switch, and documenting it, this patch is just a simple mechanical conversion, converting TARGET_PREFIXED_ADDR to TARGET_PREFIXED, etc. Because the -mprefixed-addr was just an internal and undocumented switch, I have not provided for an alias between -mprefixed to -mprefixed-addr (though I can do that if desired). I have tested these patches on both little endian and big endian Linux 64-bit systems, and there were no regressions. Can I check these patches into the master GCC branch for GCC 10? 2020-02-10 Michael Meissner * config/rs6000/predicates.md (cint34_operand): Rename the -mprefixed-addr option to be -mprefixed. * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Rename the -mprefixed-addr option to be -mprefixed. (OTHER_FUTURE_MASKS): Likewise. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Rename the -mprefixed-addr option to be -mprefixed. Change error messages to refer to -mprefixed. (num_insns_constant_gpr): Rename the -mprefixed-addr option to be -mprefixed. (rs6000_legitimate_offset_address_p): Likewise. (rs6000_mode_dependent_address): Likewise. (rs6000_opt_masks): Change the spelling of "-mprefixed-addr" to be "-mprefixed" for target attributes and pragmas. (address_to_insn_form): Rename the -mprefixed-addr option to be -mprefixed. (rs6000_adjust_insn_length): Likewise. * config/rs6000/rs6000.h (FINAL_PRESCAN_INSN): Rename the -mprefixed-addr option to be -mprefixed. (ASM_OUTPUT_OPCODE): Likewise. * config/rs6000/rs6000.md (prefixed insn attribute): Rename the -mprefixed-addr option to be -mprefixed. * config/rs6000/rs6000.opt (-mprefixed): Rename the -mprefixed-addr option to be prefixed. Change the option from being undocumented to being documented. * doc/invoke.texi (RS/6000 and PowerPC Options): Docment the -mprefixed option. Update the -mpcrel documentation to mention -mprefixed. --- /tmp/N41Ptv_predicates.md 2020-02-07 17:56:52.590487419 -0500 +++ gcc/config/rs6000/predicates.md 2020-02-07 17:34:02.891610645 -0500 @@ -306,7 +306,7 @@ (define_predicate "const_0_to_15_operand (define_predicate "cint34_operand" (match_code "const_int") { - if (!TARGET_PREFIXED_ADDR) + if (!TARGET_PREFIXED) return 0; return SIGNED_INTEGER_34BIT_P (INTVAL (op)); --- /tmp/aS8nV8_rs6000-cpus.def 2020-02-07 17:56:52.599487550 -0500 +++ gcc/config/rs6000/rs6000-cpus.def 2020-02-07 17:34:02.894610688 -0500 @@ -79,11 +79,11 @@ is fully functional. */ #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER \ | OPTION_MASK_FUTURE \ -| OPTION_MASK_PREFIXED_ADDR) +| OPTION_MASK_PREFIXED) /* Flags that need to be turned off if -mno-future. */ #define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL \ -| OPTION_MASK_PREFIXED_ADDR) +| OPTION_MASK_PREFIXED) /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ @@ -143,7 +143,7 @@ | OPTION_MASK_POWERPC64\ | OPTION_MASK_PPC_GFXOPT \ | OPTION_MASK_PP
Re: [PATCH], Rename and document PowerPC -mprefixed-addr to -mprefixed
On Mon, Feb 10, 2020 at 09:24:07PM -0600, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote: > > This patch renames the PowerPC internal switch -mprefixed-addr to be > > -mprefixed. > > > If you use -mpcrel, you must be using the 64-bit ELF v2 ABI, and the code > > model > > must be medium. > > Currently, anyway. > > > If you use -mpcrel, the compiler will generate PC-relative > > loads and stores to access items, rather than the current TOC based loads > > and > > stores. > > Where that is the best thing to do. Is that always now? :-) > > > If you use -mpcrel, it implies -mprefixed. If you use -mno-prefixed, you > > cannot use -mpcrel. > > -mno-prefixed should imply -mno-pcrel; does it? Yes. -mno-prefixed-addr also impied -mno-pcrel. > > * doc/invoke.texi (RS/6000 and PowerPC Options): Docment the > > (typo) Thanks. > > --- /tmp/1ySv8k_invoke.texi 2020-02-07 17:56:52.700489015 -0500 > > +++ gcc/doc/invoke.texi 2020-02-07 17:34:02.925611138 -0500 > > @@ -22327,7 +22328,6 @@ faster on processors with 32-bit busses > > aligns structures containing the above types differently than > > most published application binary interface specifications for the m68k. > > > > -@item -mpcrel > > @opindex mpcrel > > Use the pc-relative addressing mode of the 68000 directly, instead of > > using a global offset table. At present, this option implies > > @option{-fpic}, > > This isn't a correct change. Yeah, evidently I put the PowerPC stuff in the m68 -mpcrel area. I'll fix it. > Okay for trunk modulo the m68k change. Thanks! > > > Segher -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test
On Wed, Feb 12, 2020 at 11:27:01PM +0100, Jakub Jelinek wrote: > On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote: > > This patch renames the PowerPC internal switch -mprefixed-addr to be > > -mprefixed. > > --- gcc/config/rs6000/rs6000.opt > +++ gcc/config/rs6000/rs6000.opt > @@ -570,8 +570,8 @@ mfuture > Target Report Mask(FUTURE) Var(rs6000_isa_flags) > Use instructions for a future architecture. > > -mprefixed-addr > -Target Undocumented Mask(PREFIXED_ADDR) Var(rs6000_isa_flags) > +mprefixed > +Target Report Mask(PREFIXED) Var(rs6000_isa_flags) > Generate (do not generate) prefixed memory instructions. > > mpcrel > > This change broke the gcc.target/powerpc/pr93122.c test, so it now > FAIL: gcc.target/powerpc/pr93122.c (test for excess errors) > Excess errors: > xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean > '-mprefixed'? > > Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to > trunk as obvious. Thanks. I don't think that test was in the trunk when I did the the bootstrap for the -mprefixed-addr to -mprefixed option. I was about to send a similar patch. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
PowerPC V10 Patches for -mcpu=future
This set of patches is an attempt to address the issues raised in the previous sets of patches: The V7 patches were for important functionality The V8 patches were for tests The V9 patches were for the PCREL_OPT support As I write this there are 12 patches. There will be more patches later to address the remaining test suite patches. I need to look at the comments for PCREL_OPT in detail to see what the strategy should be for those patches. Patches V10 #1-3 are the remaining issues from V7 #1-3 to add PADDI and PLI support for large constants. In theory once the reformating that was previously done and checked in, these should be simple. Patches V10 #4-7 break up patch V7 #6 (vector extract) into 4 separate patches. Patch V10 #8 is patch V7 #7 (turn on -mpcrel by default on 64-bit Linux targets for -mcpu=future), changing the names of the enabling macros. Patch V10 #9 is patch V7 #5 that was redone. This patch adds new effective target options for PowerPC. I have changed this patch to look at the code generated by the compiler to see if prefixed adddressing or PC-relative addressing is used for -mcpu=future. This patch needs patch V10 #8 installed to enable the prefixed addressing and PC-relative tests. In patch V10 #9, I did not modify the existing test (check_effective_target_powerpc_future_ok). As we discussed, this test should really test whether a non-prefixed instruction is generated to allow for targets that might support -mcpu=future but not enable prefixed addressing. However, at present the only instructions being submitted are prefixed instructions. So this will have to wait until we get further down the road with 'future' instructions. Patch V10 #10 is a modification of patch V8 #1. I renamed the files from paddi-?.c to prefixed-*.c so that there isn't a false match due to the .ident directive. Patch V10 #11 is a slight reworking of patch V8 #2 (testing whether we generate a prefixed instruction when the offset would be invalid for DS and DQ instruction formats). Patch V10 #12 is a slight reworking of patch V8 #3 (making sure we don't try to generate the non-existant PLWZU and PSTWU pre-modify instructions). There are 3 other patches from V8 that I will address at a later date. Patch V8 #4 are the tests for using prefixed instructions for each of the types when a large numeric offset is used. Patch V8 #5 are the tests for using PC-relative load/store instructions for each of the types to reference static values. Patch V8 #6 is the test to make sure the -fstack-protector support works when the stack frame is large and -mcpu=future is used. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #1, Use PLI to load up large DImode constants if -mcpu=future
This patch adds an alternative to use PLI to load up large DImode constants if -mcpu=future is used. It is a slight reworking of patch V7 #1 after reformating the movdi_interal64 insn. I have done bootstraps and make check on a power8 little endian system and there were no regressions. Can I check this patch in? Patch V7 #1: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01301.html 2019-12-09 Michael Meissner * config/rs6000/rs6000.c (num_insns_constant_gpr): Return 1 if the constant can be loaded with PLI if -mcpu=future. * config/rs6000/rs6000.md (movdi_internal64): Add alternative to use PLI to load up 34-bit constants if -mcpu=future. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279141) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5541,6 +5541,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va && (value >> 31 == -1 || value >> 31 == 0)) return 1; + /* PADDI can support up to 34 bit signed integers. */ + else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value)) +return 1; + else if (TARGET_POWERPC64) { HOST_WIDE_INT low = ((value & 0x) ^ 0x8000) - 0x8000; Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 279141) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8828,7 +8828,7 @@ (define_split }) ;;GPR store GPR loadGPR move -;;GPR li GPR lis GPR # +;;GPR li GPR lis GPR pli GPR # ;;FPR store FPR loadFPR move ;;AVX store AVX store AVX loadAVX loadVSX move ;;P9 0P9 -1 AVX 0/-1VSX 0 VSX -1 @@ -8838,7 +8838,7 @@ (define_split (define_insn "*movdi_internal64" [(set (match_operand:DI 0 "nonimmediate_operand" "=YZ,r, r, - r, r, r, + r, r, r, r, m, ^d, ^d, wY, Z, $v, $v, ^wa, wa, wa, v, wa, wa, @@ -8847,7 +8847,7 @@ (define_insn "*movdi_internal64" ?r, ?wa") (match_operand:DI 1 "input_operand" "r, YZ, r, - I, L, nF, + I, L, eI, nF, ^d, m, ^d, ^v, $v, wY, Z, ^wa, Oj, wM, OjwM, Oj, wM, @@ -8863,6 +8863,7 @@ (define_insn "*movdi_internal64" mr %0,%1 li %0,%1 lis %0,%v1 + li %0,%1 # stfd%U0%X0 %1,%0 lfd%U1%X1 %0,%1 @@ -8886,7 +8887,7 @@ (define_insn "*movdi_internal64" mtvsrd %x0,%1" [(set_attr "type" "store, load, *, - *, *, *, + *, *, *, *, fpstore,fpload, fpsimple, fpstore,fpstore,fpload, fpload, veclogical, vecsimple, vecsimple, vecsimple, veclogical, veclogical, @@ -8896,7 +8897,7 @@ (define_insn "*movdi_internal64" (set_attr "size" "64") (set_attr "length" "*, *, *, - *, *, 20, + *, *, *, 20, *, *, *, *, *, *, *, *, *, *, *, *, *, @@ -8905,7 +8906,7 @@ (define_insn "*movdi_internal64" *, *") (set_attr "isa" "*, *, *, - *, *, *, + *, *, fut,*, *, *, *, p9v,p7v,p9v,p7v,*, p9v,p9v,p7v,*, *, -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #2, use PLI to load up large SImode constants if -mcpu=future
This patch adds an alternative to use PLI to load up large SImode constants if -mcpu=future is used. It is a slight reworking of patch V7 #2 after reformating the movsi_interal1 insn. I have done bootstraps and make check on a power8 little endian system and there were no regressions. Can I check this patch in once patch V10 #1 is checked in? Patch V7 #2: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01302.html 2019-12-09 Michael Meissner * config/rs6000/rs6000.md (movsi_internal1): Add alternative to use PLI to load up 34-bit constants if -mcpu=future. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 279143) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -6892,7 +6892,7 @@ (define_split ;;MR LA ;;LWZ LFIWZX LXSIWZX ;;STW STFIWX STXSIWX -;;LI LIS # +;;LI LIS PLI # ;;XXLOR XXSPLTIB 0 XXSPLTIB -1 VSPLTISW ;;XXLXOR 0XXLORC -1 P9 const ;;MTVSRWZ MFVSRWZ @@ -6903,7 +6903,7 @@ (define_insn "*movsi_internal1" "=r, r, r, d, v, m, Z, Z, - r, r, r, + r, r, r, r, wa, wa, wa, v, wa, v, v, wa, r, @@ -6912,7 +6912,7 @@ (define_insn "*movsi_internal1" "r, U, m, Z, Z, r, d, v, - I, L, n, + I, L, eI, n, wa, O, wM, wB, O, wM, wS, r, wa, @@ -6930,6 +6930,7 @@ (define_insn "*movsi_internal1" stxsiwx %x1,%y0 li %0,%1 lis %0,%v1 + li %0,%1 # xxlor %x0,%x1,%x1 xxspltib %x0,0 @@ -6947,7 +6948,7 @@ (define_insn "*movsi_internal1" "*, *, load, fpload, fpload, store, fpstore,fpstore, - *, *, *, + *, *, *, *, veclogical, vecsimple, vecsimple, vecsimple, veclogical, veclogical, vecsimple, mffgpr, mftgpr, @@ -6956,7 +6957,7 @@ (define_insn "*movsi_internal1" "*, *, *, *, *, *, *, *, - *, *, 8, + *, *, *, 8, *, *, *, *, *, *, 8, *, *, @@ -6965,7 +6966,7 @@ (define_insn "*movsi_internal1" "*, *, *, p8v,p8v, *, p8v,p8v, - *, *, *, + *, *, fut,*, p8v,p9v,p9v,p8v, p9v,p8v,p9v, p8v,p8v, @@ -7120,8 +7121,7 @@ (define_insn "*movsi_from_df" (define_split [(set (match_operand:SI 0 "gpc_reg_operand") (match_operand:SI 1 "const_int_operand"))] - "(unsigned HOST_WIDE_INT) (INTVAL (operands[1]) + 0x8000) >= 0x1 - && (INTVAL (operands[1]) & 0xffff) != 0" + "num_insns_constant (operands[1], SImode) > 1" [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #3, Use PADDI to add large constants if -mcpu=future is used
This patch adds an alternative to use PADDI to add large SImode and DImode constants if -mcpu=future is used. It is a slight reworking of patch V7 #3. I have done bootstraps and make check on a power8 little endian system and there were no regressions. Can I check this patch in? 2019-12-09 Michael Meissner * config/rs6000/predicates.md (add_operand): Allow eI constants. * config/rs6000/rs6000.md (add3): Add alternative to generate PADDI for 34-bit constants if -mcpu=future. Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 279141) +++ gcc/config/rs6000/predicates.md (working copy) @@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre (define_predicate "add_operand" (if_then_else (match_code "const_int") (match_test "satisfies_constraint_I (op) -|| satisfies_constraint_L (op)") +|| satisfies_constraint_L (op) +|| satisfies_constraint_eI (op)") (match_operand 0 "gpc_reg_operand"))) ;; Return 1 if the operand is either a non-special register, or 0, or -1. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 279144) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -1761,15 +1761,17 @@ (define_expand "add3" }) (define_insn "*add3" - [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r") - (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b") - (match_operand:GPR 2 "add_operand" "r,I,L")))] + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r") + (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b") + (match_operand:GPR 2 "add_operand" "r,I,L,eI")))] "" "@ add %0,%1,%2 addi %0,%1,%2 - addis %0,%1,%v2" - [(set_attr "type" "add")]) + addis %0,%1,%v2 + addi %0,%1,%2" + [(set_attr "type" "add") + (set_attr "isa" "*,*,*,fut")]) (define_insn "*addsi3_high" [(set (match_operand:SI 0 "gpc_reg_operand" "=b") -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints
Add new constraints to match whether a memory is not prefixed (em constraint) or prefixed (ep constraint). This is one of 4 parts aimed at reworking the vector extract code in patch V7 #6. This patch just adds the new constraints, but these constraints will not be used until the next patch. Originally I had just one constraint (em) that matched non-prefixed memory operands. But in order to use it, I needed to make sure the combiner did not combine vector extracts with a variable offset with a PC-relative memory location. I.e.: #include static vector double vd; double get (unsigned int n) { return vec_extract (vd, n); } In addition, as I contemplate the bigger issue about the insn length attribute, I suspect we may need to have an ep attribute as well as em. Patch V7 #6: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01306.html I have bootstrapped the compiler on a little endian power8 system and ran make check and there were no regressions. Can I check this patch in? 2019-12-10 Michael Meissner * config/rs6000/constraints.md (em constraint): New constraint for non-prefixed memory operands. (ep constraint): New constraint for prefixed memory operands. * config/rs6000/predicates.md (non_prefixed_memory): New predicate for non-prefixed memory operands. * doc/md.texi (PowerPC constraints): Document em and ep constraints. Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 279182) +++ gcc/config/rs6000/constraints.md(working copy) @@ -202,6 +202,16 @@ (define_constraint "H" ;; Memory constraints +(define_memory_constraint "em" + "A memory operand that does not contain a prefixed address." + (and (match_code "mem") + (match_operand 0 "non_prefixed_memory"))) + +(define_memory_constraint "ep" + "A memory operand that does contains a prefixed address." + (and (match_code "mem") + (match_operand 0 "prefixed_memory"))) + (define_memory_constraint "es" "A ``stable'' memory operand; that is, one which does not include any automodification of the base register. Unlike @samp{m}, this constraint Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 279151) +++ gcc/config/rs6000/predicates.md (working copy) @@ -1846,3 +1846,17 @@ (define_predicate "prefixed_memory" { return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); }) + +;; Return true if the operand is a valid memory address that does not use a +;; prefixed address. +(define_predicate "non_prefixed_memory" + (match_code "mem") +{ + enum insn_form iform += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); + + return (iform != INSN_FORM_BAD + && iform != INSN_FORM_PREFIXED_NUMERIC + && iform != INSN_FORM_PCREL_LOCAL + && iform != INSN_FORM_PCREL_EXTERNAL); +}) Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 279182) +++ gcc/doc/md.texi (working copy) @@ -3373,6 +3373,12 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va is not. +@item em +A memory operand that does not contain a prefixed address. + +@item ep +A memory operand that does contains a prefixed address. + @item es A ``stable'' memory operand; that is, one which does not include any automodification of the base register. This used to be useful when -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #5, Fix codegen bug with vector extracts using a variable offset & PC-relative address
This patch fixes a bug with vector extracts using a PC-relative address and a variable offset with using -mcpu=future. Consider the code: #include static vector double vd; vector double *p = &vd; double get (unsigned int n) { return vec_extract (vd, n); } If you compile this code with -O2 -mcpu=future -mpcrel you get: get: pla 9,.LANCHOR0@pcrel lfdx 1,9,9 blr This is because there is only one base register temporary, and the current code tries to first create the offset and then use the same temporary to hold the address of the PC-relative value. After combine the insn is: (insn 14 9 15 2 (parallel [ (set (reg/i:DF 33 1) (unspec:DF [ (mem/c:V2DF (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) [1 vd+0 S16 A128]) (reg:DI 123 [ n ]) ] UNSPEC_VSX_EXTRACT)) (clobber (scratch:DI)) (clobber (scratch:V2DI)) ]) "foo.c":9:1 1314 {vsx_extract_v2df_var} Split2 changes this to: (insn 20 8 21 2 (set (reg:DI 3 3 [orig:123 n ] [123]) (and:DI (reg:DI 3 3 [orig:123 n ] [123]) (const_int 1 [0x1]))) "foo.c":9:1 193 {anddi3_mask} (nil)) (insn 21 20 22 2 (set (reg:DI 9 9 [126]) (ashift:DI (reg:DI 3 3 [orig:123 n ] [123]) (const_int 3 [0x3]))) "foo.c":9:1 256 {ashldi3} (nil)) (insn 22 21 23 2 (set (reg:DI 9 9 [126]) (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) "foo.c":9:1 680 {*pcrel_local_addr} (nil)) (insn 23 22 15 2 (set (reg/i:DF 33 1) (mem/c:DF (plus:DI (reg:DI 9 9 [126]) (reg:DI 9 9 [126])) [1 S8 A8])) "foo.c":9:1 512 {*movdf_hardfloat64} (nil)) I.e. setting GPR r9 first to the offset << 3, and then wiping out the offset and setting in the address of the PC-relative structure. This patch changes all of the variable extract insns and the function in rs6000.c that processes them to have a second base register temporary only if we have prefixed addresses. The code generated then becomes: get: extsw 3,3 pla 10,.LANCHOR0@pcrel rldicl 3,3,0,63 sldi 9,3,3 lfdx 1,10,9 I use the em and ep constraints to keep the alternatives separate. Using em prevents the register allocator from skipping the alternative with ep in it because it has an extra scratch register. I have bootstrapped the compiler on a little endian power8 system and ran make check without regression. Can I check this in once patch V10 #4 is checked in? 2019-12-10 Michael Meissner * config/rs6000/rs6000-protos.h (rs6000_split_vec_extract_var): Update calling signature. * config/rs6000/rs6000.c (rs6000_split_vec_extract_var): Add additional tmp base register argument. If the memory is prefixed, put the address into the new tmp base register. * config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator): Add new temporary for loading up the address of prefixed memory operands. (vsx_extract_v4sf_var): Add new temporary for loading up the address of prefixed memory operands. (vsx_extract__var, VSX_EXTRACT_I iterator): Add new temporary for loading up the address of prefixed memory operands. (vsx_extract__mode_var): Add new temporary for loading up the address of prefixed memory operands. Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 279182) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -59,7 +59,7 @@ extern void rs6000_expand_float128_conve extern void rs6000_expand_vector_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, int); extern void rs6000_expand_vector_extract (rtx, rtx, rtx); -extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx); +extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx, rtx); extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode); extern void altivec_expand_vec_perm_le (rtx op[4]); extern void rs6000_expand_extract_even (rtx, rtx, rtx); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279182) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6861,7 +6861,7 @@ rs6000_adjust_vec_address (rtx scalar_re void rs6000_split_vec_extract_var (rtx dest, rtx src, rtx element, rtx tmp_gpr, - rtx tmp_altivec) + rtx tmp_altivec, rtx tmp_prefixed) { machine_mode mode = GET_MODE (src); machine_mode scalar_mode = GET_MODE_INNER (GET_MODE (src)); @@ -6878,6 +6878,16 @@ rs6000_split_vec_ext
[PATCH] V10 patch #6, Use prefixed load/stores for vector extract with large offsets
This patch optimizes vector extracts where the vector is pointed to by an address with an offset larger than 16-bits to fold the add into the final address. I.e. #include double get (vector double *p, unsigned int h) { return vec_extract (p[5], 1); } I have bootstraped this patch on a little endian power8 system and ran make check with no regressions. Can I check this patch in? 2019-12-10 Michael Meissner * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support for the offset being 34-bits when -mcpu=future is used. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279199) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6766,9 +6766,17 @@ rs6000_adjust_vec_address (rtx scalar_re HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset); rtx offset_rtx = GEN_INT (offset); - if (IN_RANGE (offset, -32768, 32767) + /* 16-bit offset. */ + if (SIGNED_16BIT_OFFSET_P (offset) && (scalar_size < 8 || (offset & 0x3) == 0)) new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + + /* 34-bit offset if we have prefixed addresses. */ + else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset)) + new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + + /* Offset overflowed, move offset to the temporary (which will likely +be split), and do X-FORM addressing. */ else { emit_move_insn (base_tmp, offset_rtx); @@ -6799,6 +6807,12 @@ rs6000_adjust_vec_address (rtx scalar_re emit_insn (insn); } + /* Make sure we don't overwrite the temporary if the element being +extracted is variable, and we've put the offset into base_tmp +previously. */ + else if (rtx_equal_p (base_tmp, element_offset)) + emit_insn (gen_add2_insn (base_tmp, op1)); + else { emit_move_insn (base_tmp, op1); -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #7, Improve vector_extract code of a PC-relative address with a constant offset for -mcpu=future
This patch improves the code of vector_extract when the vector is addressed with a PC-relative address, and the element number is constant. I.e. #include static vector double vd[10]; vector double *p = &vd[0]; double get (void) { return vector_extract (vd[4], 1); } I have bootstrapped this code on a little endian power8 and ran make check and there were no regressions. Can I check this into the trunk? 2019-12-10 Michael Meissner * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper function. (rs6000_adjust_vec_address): Add support for folding a constant offset of a vector extract of a vector accessed with PC-relative addressing into the offset of the load. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279200) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6698,6 +6698,30 @@ rs6000_expand_vector_extract (rtx target } } +/* Helper function to return an address mask based on a physical register. */ + +static addr_mask_type +rs6000_reg_to_addr_mask (rtx reg, machine_mode mode) +{ + unsigned int r = reg_or_subregno (reg); + addr_mask_type addr_mask; + + gcc_assert (HARD_REGISTER_NUM_P (r)); + if (INT_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR]; + + else if (FP_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR]; + + else if (ALTIVEC_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX]; + + else +gcc_unreachable (); + + return addr_mask; +} + /* Adjust a memory address (MEM) of a vector type to point to a scalar field within the vector (ELEMENT) with a mode (SCALAR_MODE). Use a base register temporary (BASE_TMP) to fixup the address. Return the new memory address @@ -6823,8 +6847,57 @@ rs6000_adjust_vec_address (rtx scalar_re } } + /* For references to local static variables, try to fold a constant offset + into the address. */ + else if (pcrel_local_address (addr, Pmode) && CONST_INT_P (element_offset)) +{ + if (GET_CODE (addr) == CONST) + addr = XEXP (addr, 0); + + if (GET_CODE (addr) == PLUS) + { + rtx op0 = XEXP (addr, 0); + rtx op1 = XEXP (addr, 1); + if (CONST_INT_P (op1)) + { + HOST_WIDE_INT offset + = INTVAL (XEXP (addr, 1)) + INTVAL (element_offset); + + if (offset == 0) + new_addr = op0; + + else if (SIGNED_34BIT_OFFSET_P (offset)) + { + rtx plus = gen_rtx_PLUS (Pmode, op0, GEN_INT (offset)); + new_addr = gen_rtx_CONST (Pmode, plus); + } + + else + { + emit_move_insn (base_tmp, addr); + new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); + } + } + else + { + emit_move_insn (base_tmp, addr); + new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); + } + } + + else + { + rtx plus = gen_rtx_PLUS (Pmode, addr, element_offset); + new_addr = gen_rtx_CONST (Pmode, plus); + } +} + else { + /* Make sure we don't overwrite the temporary if the vector extract +offset was variable. */ + gcc_assert (!rtx_equal_p (base_tmp, element_offset)); + emit_move_insn (base_tmp, addr); new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); } @@ -6834,21 +6907,8 @@ rs6000_adjust_vec_address (rtx scalar_re if (GET_CODE (new_addr) == PLUS) { rtx op1 = XEXP (new_addr, 1); - addr_mask_type addr_mask; - unsigned int scalar_regno = reg_or_subregno (scalar_reg); - - gcc_assert (HARD_REGISTER_NUM_P (scalar_regno)); - if (INT_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_GPR]; - - else if (FP_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_FPR]; - - else if (ALTIVEC_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_VMX]; - - else - gcc_unreachable (); + addr_mask_type addr_mask + = rs6000_reg_to_addr_mask (scalar_reg, scalar_mode); if (REG_P (op1) || SUBREG_P (op1)) valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0; @@ -6856,9 +6916,21 @@ rs6000_adjust_vec_address (rtx scalar_re valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0; } + /* An address that is a single register is always valid for either indexed or + offsettable loads. */ else if (REG_P (new_addr) || SUBREG_P (new_addr)) valid_addr_p = true; + /* If we have a PC-relative address, check if offsetable loads are + allowed. */ + else if (pcrel_local
[PATCH] V10 patch #8, Enable -mpcrel and -mprefixed-addr for -mcpu=future on 64-bit little endian Linux systems
This patch enables -mpcrel and -mprefixed-addr when -mcpu=future is used on a 64-bit little endian Linux system, but it does not enable those options on other systems. It is a slight reworking of patch V7 #7 taking into account the comments you made. In particular, I changed the macros used by the target tm.h file to be: PREFIXED_ADDR_SUPPORTED_BY_OS PCREL_SUPPORTED_BY_OS Patch V7 #7: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01307.html I have bootstrapped the compiler on a little endian power8 system, and ran make check with no regressions. I also tested the code by not turning on -mpcrel or -mprefixed-addr for Linux 64-bit little endian and inspected the code and saw the appropriate code was generated. In terms of your comment: | ... and I don't understand this code. If you use -mpcrel but you do not | have the medium model, you _do_ get prefixed but you do _not_ get pcrel? | And this all quietly? You do not get this quietly. You will get an error if you use -mpcrel and -mcmodel=large options together. 2019-12-10 Michael Meissner * config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to 1 to enable prefixed addressing if -mcpu=future. (PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing if -mcpu=future. * config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not enable -mprefixed-addr or -mpcrel by default. (ADDRESSING_FUTURE_MASKS): New macro. (OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS. * config/rs6000/rs6000.c (PREFIXED_ADDR_SUPPORTED_BY_OS): Disable prefixed addressing unless the target OS tm.h says we should enable it. (PCREL_SUPPORTED_BY_OS): Disable PC-relative addressing unless the target OS tm.h says we should enable it. (rs6000_debug_reg_global): Print whether prefixed addressing and PC-relative addressing is enabled by default if -mcpu=future. (rs6000_option_override_internal): Move setting prefixed addressing and PC-relative addressing after the sub-target option handling is done. Only enable prefixed addressing or PC-relative address on -mcpu=future system if the target OS says to enable it. Disallow prefixed addressing on 32-bit systems or if the target object file is not ELF v2. Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 279141) +++ gcc/config/rs6000/linux64.h (working copy) @@ -640,3 +640,11 @@ extern int dot_symbols; enabling the __float128 keyword. */ #undef TARGET_FLOAT128_ENABLE_TYPE #define TARGET_FLOAT128_ENABLE_TYPE 1 + +/* Enable support for pc-relative and numeric prefixed addressing on the + 'future' system. */ +#undef PREFIXED_ADDR_SUPPORTED_BY_OS +#define PREFIXED_ADDR_SUPPORTED_BY_OS 1 + +#undef PCREL_SUPPORTED_BY_OS +#define PCREL_SUPPORTED_BY_OS 1 Index: gcc/config/rs6000/rs6000-cpus.def === --- gcc/config/rs6000/rs6000-cpus.def (revision 279141) +++ gcc/config/rs6000/rs6000-cpus.def (working copy) @@ -75,15 +75,22 @@ | OPTION_MASK_P8_VECTOR\ | OPTION_MASK_P9_VECTOR) -/* Support for a future processor's features. Do not enable -mpcrel until it - is fully functional. */ +/* Support for a future processor's features. The prefixed and pc-relative + addressing bits are not added here. Instead, they are added if the target + OS tm.h says that it supports the addressing modes by default when + -mcpu=future is used. */ #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER \ -| OPTION_MASK_FUTURE \ +| OPTION_MASK_FUTURE) + +/* Addressing related flags on a future processor. These are options that need + to be cleared if the target OS is not capable of supporting prefixed + addressing at all (such as 32-bit mode or if the object file format is not + ELF v2). */ +#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL \ | OPTION_MASK_PREFIXED_ADDR) /* Flags that need to be turned off if -mno-future. */ -#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL \ -| OPTION_MASK_PREFIXED_ADDR) +#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279202) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -98,6 +98,16 @@ #endif #endif +/
[PATCH] V10 patch #9, Add new effective targets for the testsuite
Patch V10 #9 is patch V7 #5 that was redone. This patch adds new effective target options for PowerPC. I have changed this patch to look at the code generated by the compiler to see if prefixed adddressing or PC-relative addressing is used for -mcpu=future. This patch needs patch V10 #8 installed to enable the prefixed addressing and PC-relative tests. In patch V10 #9, I did not modify the existing test (check_effective_target_powerpc_future_ok). As we discussed, this test should really test whether a non-prefixed instruction is generated to allow for targets that might support -mcpu=future but not enable prefixed addressing. However, at present the only instructions being submitted are prefixed instructions. So this will have to wait until we get further down the road with 'future' instructions. I have bootstrapped a little endian power8 compiler and ran make check with no regressions. In addition with this patch installed, the new tests now run as expected with these changes. Can I check this in (this needs patch V10 #8 to be installed to enable the tests). 2019-12-11 Michael Meissner * lib/target-supports.exp (check_effective_target_powerpc_pcrel): New target for PowerPC -mcpu=future support. (check_effective_target_powerpc_prefixed_addr): New target for PowerPC -mcpu=future support. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 279141) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2161,6 +2161,23 @@ proc check_p9modulo_hw_available { } { }] } +# Return 1 if the target generates PC-relative instructions automatically +proc check_effective_target_powerpc_pcrel { } { +return [check_no_messages_and_pattern powerpc_pcrel \ + {\mpld\M.*[@]pcrel} assembly { + static long s; + long *p = &s; + long foo (void) { return s; } + } {-O2 -mcpu=future}] +} + +# Return 1 if the target generates prefixed instructions automatically +proc check_effective_target_powerpc_prefixed_addr { } { +return [check_no_messages_and_pattern powerpc_prefixed_addr \ + {\mpld\M} assembly { + long foo (long *p) { return p[0x12345]; } + } {-O2 -mcpu=future}] +} # Return 1 if the target supports executing FUTURE instructions, 0 otherwise. # Cache the result. It is assumed that if a simulator does not support the -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #10, Add PADDI/PLI tests for -mcpu=future
Patch V10 #10 is a modification of patch V8 #1. I renamed the files from paddi-?.c to prefixed-*.c so that there isn't a false match due to the .ident directive. This test passes when I do a make check. One patch V10 #9 is checked in can I commit this patch? 2019-12-11 Michael Meissner * gcc.target/powerpc/prefix-add.c: New test for -mcpu=future generating PADDI for large constant adds. * gcc.target/powerpc/prefix-di-constant.c: New test for -mcpu=future generating PLI to load up large DImode constants. * gcc.target/powerpc/prefix-si-constant.c: New test for -mcpu=future generating PLI to load up large SImode constants. Index: gcc/testsuite/gcc.target/powerpc/prefix-add.c === --- gcc/testsuite/gcc.target/powerpc/prefix-add.c (revision 279252) +++ gcc/testsuite/gcc.target/powerpc/prefix-add.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PADDI is generated to add a large constant. */ +unsigned long +add (unsigned long a) +{ + return a + 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpaddi\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c === --- gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c (revision 279252) +++ gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant. */ +unsigned long +large (void) +{ + return 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c === --- gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c (revision 279252) +++ gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant for SImode. */ +void +large_si (unsigned int *p) +{ + *p = 0x12345U; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V10 patch #11, Add test for generating prefixed load/store when the offset is not valid for DS/DQ instructions
Patch V10 #11 is a slight reworking of patch V8 #2 (testing whether we generate a prefixed instruction when the offset would be invalid for DS and DQ instruction formats). This test passes when I run make check. Can I check this in when patch V10 #9 is checked in? 2019-12-11 Michael Meissner * gcc.target/powerpc/prefix-ds-dq.c: New test to verify that we generate the prefix load/store instructions for traditional instructions with an offset that doesn't match DS/DQ requirements. Index: gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c === --- gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (revision 279256) +++ gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (working copy) @@ -0,0 +1,156 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests whether we generate a prefixed load/store operation for addresses that + don't meet DS/DQ offset constraints. */ + +unsigned long +load_uc_offset1 (unsigned char *p) +{ + return p[1]; /* should generate LBZ. */ +} + +long +load_sc_offset1 (signed char *p) +{ + return p[1]; /* should generate LBZ + EXTSB. */ +} + +unsigned long +load_us_offset1 (unsigned char *p) +{ + return *(unsigned short *)(p + 1); /* should generate LHZ. */ +} + +long +load_ss_offset1 (unsigned char *p) +{ + return *(short *)(p + 1);/* should generate LHA. */ +} + +unsigned long +load_ui_offset1 (unsigned char *p) +{ + return *(unsigned int *)(p + 1); /* should generate LWZ. */ +} + +long +load_si_offset1 (unsigned char *p) +{ + return *(int *)(p + 1); /* should generate PLWA. */ +} + +unsigned long +load_ul_offset1 (unsigned char *p) +{ + return *(unsigned long *)(p + 1);/* should generate PLD. */ +} + +long +load_sl_offset1 (unsigned char *p) +{ + return *(long *)(p + 1); /* should generate PLD. */ +} + +float +load_float_offset1 (unsigned char *p) +{ + return *(float *)(p + 1);/* should generate LFS. */ +} + +double +load_double_offset1 (unsigned char *p) +{ + return *(double *)(p + 1); /* should generate LFD. */ +} + +__float128 +load_float128_offset1 (unsigned char *p) +{ + return *(__float128 *)(p + 1); /* should generate PLXV. */ +} + +void +store_uc_offset1 (unsigned char uc, unsigned char *p) +{ + p[1] = uc; /* should generate STB. */ +} + +void +store_sc_offset1 (signed char sc, signed char *p) +{ + p[1] = sc; /* should generate STB. */ +} + +void +store_us_offset1 (unsigned short us, unsigned char *p) +{ + *(unsigned short *)(p + 1) = us; /* should generate STH. */ +} + +void +store_ss_offset1 (signed short ss, unsigned char *p) +{ + *(signed short *)(p + 1) = ss; /* should generate STH. */ +} + +void +store_ui_offset1 (unsigned int ui, unsigned char *p) +{ + *(unsigned int *)(p + 1) = ui; /* should generate STW. */ +} + +void +store_si_offset1 (signed int si, unsigned char *p) +{ + *(signed int *)(p + 1) = si; /* should generate STW. */ +} + +void +store_ul_offset1 (unsigned long ul, unsigned char *p) +{ + *(unsigned long *)(p + 1) = ul; /* should generate PSTD. */ +} + +void +store_sl_offset1 (signed long sl, unsigned char *p) +{ + *(signed long *)(p + 1) = sl;/* should generate PSTD. */ +} + +void +store_float_offset1 (float f, unsigned char *p) +{ + *(float *)(p + 1) = f; /* should generate STF. */ +} + +void +store_double_offset1 (double d, unsigned char *p) +{ + *(double *)(p + 1) = d; /* should generate STD. */ +} + +void +store_float128_offset1 (__float128 f128, unsigned char *p) +{ + *(__float128 *)(p + 1) = f128; /* should generate PSTXV. */ +} + +/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlbz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlha\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlhz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlwz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mplwa\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mplxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\msth\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M} 2 } } */ -- Michael Meissner, IBM IBM,
[PATCH] V10 patch #12, Test to make sure we don't generate prefixed pre-modify load/stores for -mcpu=future
Patch V10 #12 is a slight reworking of patch V8 #3 (making sure we don't try to generate the non-existant PLWZU and PSTWU pre-modify instructions). This test passes when I run make check. Can I check this in when patch V10 #9 is installed? 2019-12-11 Michael Meissner * gcc.target/powerpc/prefix-no-premodify.c: Make sure we do not generate the non-existent PLWZU instruction if -mcpu=future. Index: gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c === --- gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c (revision 279259) +++ gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c (working copy) @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Make sure that we don't generate a prefixed form of the load and store with + update instructions (i.e. instead of generating LWZU we have to generate + PLWZ plus a PADDI). */ + +#ifndef SIZE +#define SIZE 5 +#endif + +struct foo { + unsigned int field; + char pad[SIZE]; +}; + +struct foo *inc_load (struct foo *p, unsigned int *q) +{ + *q = (++p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *dec_load (struct foo *p, unsigned int *q) +{ + *q = (--p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *inc_store (struct foo *p, unsigned int *q) +{ + (++p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +struct foo *dec_store (struct foo *p, unsigned int *q) +{ + (--p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +/* { dg-final { scan-assembler-times {\mlwz\M}2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M}2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstw\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mplwzu\M}} } */ +/* { dg-final { scan-assembler-not {\mpstwu\M}} } */ +/* { dg-final { scan-assembler-not {\maddis\M}} } */ +/* { dg-final { scan-assembler-not {\maddi\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints
On Tue, Dec 17, 2019 at 11:15:29AM -0600, Segher Boessenkool wrote: > Hi! > > On Wed, Dec 11, 2019 at 07:29:05PM -0500, Michael Meissner wrote: > > +(define_memory_constraint "em" > > + "A memory operand that does not contain a prefixed address." > > + (and (match_code "mem") > > + (match_operand 0 "non_prefixed_memory"))) > > + > > +(define_memory_constraint "ep" > > + "A memory operand that does contains a prefixed address." > > + (and (match_code "mem") > > + (match_operand 0 "prefixed_memory"))) > > "does contain". Or maybe just say "with a non-prefixed address" and > "with a prefixed address"? Ok. > > +;; Return true if the operand is a valid memory address that does not use a > > +;; prefixed address. > > +(define_predicate "non_prefixed_memory" > > + (match_code "mem") > > +{ > > + enum insn_form iform > > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); > > + > > + return (iform != INSN_FORM_BAD > > + && iform != INSN_FORM_PREFIXED_NUMERIC > > + && iform != INSN_FORM_PCREL_LOCAL > > + && iform != INSN_FORM_PCREL_EXTERNAL); > > +}) > > Why can this not use just !address_is_prefixed? Why is an > INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed? What > does "BAD" mean, really? Should that ever happen, should that not ICE? You can't just invert !address_is_prefixed, because it would all things that may not be valid memory addresses. So we could just do: { /* If the operand is not a valid memory operand even if it is not prefixed, do not return true. */ if (!memory_operand (op, mode)) return false; return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); } It is important that the predicate not return true if the operand is NOT a valid memory address. If you allow non-valid memory addresses, the register allocator will create things like: (mem:MODE (plus:DI (reg:DI x) (plus:DI (reg:DI y) (const_int z Or some such -- I forget the exact sequence it created. A later pass would then choke with bad insn. INSN_FORM_BAD just means that the operand is not valid as a memory address. > It is very confusing if any valid memory is neither "prefixed_memory" nor > "non_prefixed_memory"! The point was to make sure the memory is valid. Once it is a valid memory address, then just a simple !address_is_prefixed will work. > > --- gcc/doc/md.texi (revision 279182) > > +++ gcc/doc/md.texi (working copy) > > @@ -3373,6 +3373,12 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va > > > > is not. > > > > +@item em > > +A memory operand that does not contain a prefixed address. > > + > > +@item ep > > +A memory operand that does contains a prefixed address. > > Same comments as above. Ok. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints
On Tue, Dec 17, 2019 at 05:35:24PM -0600, Segher Boessenkool wrote: > On Tue, Dec 17, 2019 at 05:29:44PM -0500, Michael Meissner wrote: > > On Tue, Dec 17, 2019 at 11:15:29AM -0600, Segher Boessenkool wrote: > > > > +;; Return true if the operand is a valid memory address that does not > > > > use a > > > > +;; prefixed address. > > > > +(define_predicate "non_prefixed_memory" > > > > + (match_code "mem") > > > > +{ > > > > + enum insn_form iform > > > > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); > > > > + > > > > + return (iform != INSN_FORM_BAD > > > > + && iform != INSN_FORM_PREFIXED_NUMERIC > > > > + && iform != INSN_FORM_PCREL_LOCAL > > > > + && iform != INSN_FORM_PCREL_EXTERNAL); > > > > +}) > > > > > > Why can this not use just !address_is_prefixed? Why is an > > > INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed? What > > > does "BAD" mean, really? Should that ever happen, should that not ICE? > > > > You can't just invert !address_is_prefixed, because it would all things that > > may not be valid memory addresses. > > Yes, so test that *explicitly*, in the "prefixed_memory" predicate as > well please. Make the two predicates as much the same as possible. > > And what is with the INSN_FORM_PCREL_EXTERNAL? INSN_FORM_PCREL_EXTERNAL says that the operand is a reference to an external symbol. It cannot appear in an actual memory insns in normal usage, but it needs to be handled several places: 1) pcrel_extern_addr needs to be able to load an external address into a GPR register. 2) The prefixed insn attribute (and prefixed_paddi_p which it calls) needs to recognize pcrel_extern_addr and note that it is prefixed. 3) The PCREL_OPT support will need to support it. If you do the PCREL_OPT support via combine and flow control passes, you will need to be able to handle external references as addresses. The function address_is_prefixed, specifically does not return true for external symbols, because you can't use them in a normal context. In the context of the patch (vector extract), it needs to decide whether the address is prefixed or not, in order to decide whether it needs a second base register temporary. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V10 patch #5, Fix codegen bug with vector extracts using a variable offset & PC-relative address
On Tue, Dec 17, 2019 at 12:02:46PM -0600, Segher Boessenkool wrote: > > ;; Variable V2DI/V2DF extract > > (define_insn_and_split "vsx_extract__var" > > - [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r") > > - (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m") > > -(match_operand:DI 2 "gpc_reg_operand" "r,r,r")] > > - UNSPEC_VSX_EXTRACT)) > > - (clobber (match_scratch:DI 3 "=r,&b,&b")) > > - (clobber (match_scratch:V2DI 4 "=&v,X,X"))] > > + [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r,wa,r") > > + (unspec: > > +[(match_operand:VSX_D 1 "input_operand" "v,em,em,ep,ep") > > + (match_operand:DI 2 "gpc_reg_operand" "r,r,r,r,r")] > > +UNSPEC_VSX_EXTRACT)) > > + (clobber (match_scratch:DI 3 "=r,&b,&b,&b,&b")) > > + (clobber (match_scratch:V2DI 4 "=&v,X,X,X,X")) > > + (clobber (match_scratch:DI 5 "=X,X,X,&b,&b"))] > >"VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" > >"#" > >"&& reload_completed" > >[(const_int 0)] > > { > >rs6000_split_vec_extract_var (operands[0], operands[1], operands[2], > > - operands[3], operands[4]); > > + operands[3], operands[4], operands[5]); > > This writes to operands[2], which does not match its constraint. > > Same in the other splitters. Right. Good catch. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] PowerPC, Rename SIGNED_BIT_OFFSET_P to SIGNED_INTEGER_BIT_P
In the patch: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01201.html Segher Boessenkool asked me to submit a patch to rename the macros used to see if a number is a valid signed 16 or 34-bit value: > Please follow up with a patch to not call random numbers "OFFSET". This patch does this, renaming: SIGNED_34BIT_OFFSET_P -> SIGNED_INTEGER_34BIT_P SIGNED_16BIT_OFFSET_P -> SIGNED_INTEGER_16BIT_P I did not change the secondary macros (SIGNED_34BIT_OFFSET_EXTRA_P and SIGNED_16BIT_OFFSET_P), since those are exclusively used for offset calculations. But I can if you prefer it that way. I also converted one a use in num_insns_constant_gpr to use the macro (it had been in previous patches, but I dropped in the last patch just to get the minimal change in). I've bootstrapped compilers with these patches and there was no regression in the test suite. Can I check this into the trunk? Some of the remaining patches in the V10 series will need to be modified as well. I will submit those patches (after I rework the vector extract stuff) in a new series. 2019-12-17 Michael Meissner * config/rs6000/predicates.md (cint34_operand): Use SIGNED_INTEGER_34BIT_P macro. * config/rs6000/rs6000.c (num_insns_constant_gpr): Use the SIGNED_INTEGER_16BIT_P and SIGNED_INTEGER_34BIT_P macros. (address_to_insn_form): Use the SIGNED_INTEGER_16BIT_P and SIGNED_INTEGER_34BIT_P macros. * config/rs6000/rs6000.h (SIGNED_INTEGER_NBIT_P): New macro. (SIGNED_INTEGER_16BIT_P): Rename SIGNED_16BIT_OFFSET_P to be SIGNED_INTEGER_34BIT_P. (SIGNED_INTEGER_34BIT_P): Rename SIGNED_34BIT_OFFSET_P to be SIGNED_INTEGER_34BIT_P. Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 279478) +++ gcc/config/rs6000/predicates.md (working copy) @@ -309,7 +309,7 @@ (define_predicate "cint34_operand" if (!TARGET_PREFIXED_ADDR) return 0; - return SIGNED_34BIT_OFFSET_P (INTVAL (op)); + return SIGNED_INTEGER_34BIT_P (INTVAL (op)); }) ;; Return 1 if op is a register that is not special. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279478) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5557,7 +5557,7 @@ static int num_insns_constant_gpr (HOST_WIDE_INT value) { /* signed constant loadable with addi */ - if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x1) + if (SIGNED_INTEGER_16BIT_P (value)) return 1; /* constant loadable with addis */ @@ -5566,7 +5566,7 @@ num_insns_constant_gpr (HOST_WIDE_INT va return 1; /* PADDI can support up to 34 bit signed integers. */ - else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value)) + else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (value)) return 1; else if (TARGET_POWERPC64) @@ -24770,7 +24770,7 @@ address_to_insn_form (rtx addr, return INSN_FORM_BAD; HOST_WIDE_INT offset = INTVAL (op1); - if (!SIGNED_34BIT_OFFSET_P (offset)) + if (!SIGNED_INTEGER_34BIT_P (offset)) return INSN_FORM_BAD; /* Check for local and external PC-relative addresses. Labels are always @@ -24789,7 +24789,7 @@ address_to_insn_form (rtx addr, return INSN_FORM_BAD; /* Large offsets must be prefixed. */ - if (!SIGNED_16BIT_OFFSET_P (offset)) + if (!SIGNED_INTEGER_16BIT_P (offset)) { if (TARGET_PREFIXED_ADDR) return INSN_FORM_PREFIXED_NUMERIC; Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h (revision 279478) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -2529,18 +2529,16 @@ typedef struct GTY(()) machine_function #pragma GCC poison TARGET_FLOAT128 OPTION_MASK_FLOAT128 MASK_FLOAT128 #endif -/* Whether a given VALUE is a valid 16 or 34-bit signed offset. */ -#define SIGNED_16BIT_OFFSET_P(VALUE) \ +/* Whether a given VALUE is a valid 16 or 34-bit signed integer. */ +#define SIGNED_INTEGER_NBIT_P(VALUE, N) \ IN_RANGE ((VALUE), \ - -(HOST_WIDE_INT_1 << 15), \ - (HOST_WIDE_INT_1 << 15) - 1) + -(HOST_WIDE_INT_1 << ((N)-1)), \ + (HOST_WIDE_INT_1 << ((N)-1)) - 1) -#define SIGNED_34BIT_OFFSET_P(VALUE) \ - IN_RANGE ((VALUE), \ - -(HOST_WIDE_INT_1 << 33), \ - (HOST_WIDE_INT_1 << 33) - 1) +#define SIGNED_INTEGER_16BIT_P(VALUE) SIGNED_INTEGER_NBIT_P (VALUE,
PowerPC -mcpu=future patches, V11
This set of patches reworks the vector extract issues in the V10 patches. If you recall, in V10, you pointed out that for vector extract, the existing code overwrote an input argument, and that is fixed in these patches. In V10, I added two new constraints (ep and em) to categorize whether a memory is prefixed or not prefixed, and we had some discussion about how to write the predicates. However, yesterday I realized that for the case adding new constraints (vector extract with a variable element number, where the vector is in memory, and we are optimizing the load to just load up the element being extract), what we want is just the address of the vector in a base register. This is because in order access the element where the element number is variable, we eventually will need to do an X-FORM load, with the vector address in one register, and the byte offset in another. Instead of adding new alternatives and new scratch registers, I could just simplify the code and use the 'Q' constraint that says use a single register as the address. The register allocator will do the necessary work to load up the address during register allocation. I did notice that the documentation for 'Q' was wrong, so one of the patches updates the documentation. In addition, after committing the first 3 patches from V10 that added PADDI and PLI support for -mcpu=future, Segher asked me to do a patch to rename two of the macros. That patch is now checked in, and some of these patches include changes due to the macro renaming. After the vector extract patch rework, I included the remaining patch to the compiler (make -mpcrel default on Linux 64-bit for -mcpu=future). I included the tests after doing the -mpcrel default changes. In addition to the tests in V10, I added some new tests for the vector extract code. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #1 of 15, Fix bug in vec_extract
This patch fixes the bug pointed out in the V10 patch review that the code modified an input argument to vector extract with a variable element number. I also added two gcc_asserts to the vector extract address code to signal an internal error if the temporary base register was used for two different purposes. This shows up if you have a vector whose address is a PC-relative address and the element number was variable. Later patches will fix the case that I know of that generates the bad code, but it is still important to make sure the same case doesn't happen in the future. With this patch applied, the compiler will signal an error. FWIW, I did build all of Spec 2017 and Spec 2006 with this patch applied, but not the others, and we did not get an assertion failure. I have bootstrapped the compiler and there were no regression test failures on a little endian Power8 system. 2019-12-20 Michael Meissner * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add assertion to make sure that we don't load an address into a temporary that is already used. (rs6000_split_vec_extract_var): Do not overwrite the element when masking it. Use the base register temporary instead. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279549) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6757,6 +6757,8 @@ rs6000_adjust_vec_address (rtx scalar_re else { + /* If we are called from rs6000_split_vec_extract_var, base_tmp may +be the same as element. */ if (TARGET_POWERPC64) emit_insn (gen_ashldi3 (base_tmp, element, GEN_INT (byte_shift))); else @@ -6825,6 +6827,11 @@ rs6000_adjust_vec_address (rtx scalar_re else { + /* Make sure base_tmp is not the same as element_offset. This +can happen if the element number is variable and the address +is not a simple address. Otherwise we lose the offset, and +double the address. */ + gcc_assert (!reg_mentioned_p (base_tmp, element_offset)); emit_move_insn (base_tmp, op1); emit_insn (gen_add2_insn (base_tmp, element_offset)); } @@ -6835,6 +6842,10 @@ rs6000_adjust_vec_address (rtx scalar_re else { + /* Make sure base_tmp is not the same as element_offset. This can happen +if the element number is variable and the address is not a simple +address. Otherwise we lose the offset, and double the address. */ + gcc_assert (!reg_mentioned_p (base_tmp, element_offset)); emit_move_insn (base_tmp, addr); new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); } @@ -6902,9 +6913,10 @@ rs6000_split_vec_extract_var (rtx dest, int num_elements = GET_MODE_NUNITS (mode); rtx num_ele_m1 = GEN_INT (num_elements - 1); - emit_insn (gen_anddi3 (element, element, num_ele_m1)); + /* Make sure the element number is in bounds. */ gcc_assert (REG_P (tmp_gpr)); - emit_move_insn (dest, rs6000_adjust_vec_address (dest, src, element, + emit_insn (gen_anddi3 (tmp_gpr, element, num_ele_m1)); + emit_move_insn (dest, rs6000_adjust_vec_address (dest, src, tmp_gpr, tmp_gpr, scalar_mode)); return; } -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #2 of 15, Use prefixed load for vector extract with large offset
This patch incorporates large offsets for -mcpu=future when we optimization a vector extract from memory and the memory address previously had been a prefixed address with a large offset. The current code would generate loading up the constant into a temporary and then doing an indexed load. Successive passes would eventually optimize that back into the form we want (having the base register plus a large offset), but it is better to generate the optimial code sooner. I have bootstrapped this change on a little endian power8 system and there were no regressions. Can I check this into the trunk? 2019-12-20 Michael Meissner * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support for the offset being 34-bits when -mcpu=future is used. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279553) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6792,9 +6792,17 @@ rs6000_adjust_vec_address (rtx scalar_re HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset); rtx offset_rtx = GEN_INT (offset); - if (IN_RANGE (offset, -32768, 32767) + /* 16-bit offset. */ + if (SIGNED_INTEGER_16BIT_P (offset) && (scalar_size < 8 || (offset & 0x3) == 0)) new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + + /* 34-bit offset if we have prefixed addresses. */ + else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (offset)) + new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + + /* Offset overflowed, move offset to the temporary (which will likely +be split), and do X-FORM addressing. */ else { emit_move_insn (base_tmp, offset_rtx); @@ -6825,6 +6833,12 @@ rs6000_adjust_vec_address (rtx scalar_re emit_insn (insn); } + /* Make sure we don't overwrite the temporary if the element being +extracted is variable, and we've put the offset into base_tmp +previously. */ + else if (rtx_equal_p (base_tmp, element_offset)) + emit_insn (gen_add2_insn (base_tmp, op1)); + else { /* Make sure base_tmp is not the same as element_offset. This -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #3 of 15, Use 'Q' constraint for variable vector extract from memory
As I mentioned in the intro, for the case where we are optimizing the extract of a variable element from a vector in memory, the current code takes a regular address, and the temporary that holds the byte offset, and tries to generate a new address. In particular, it failed when the vector was a PC-relative address, because it didn't have enough temporary registers, and it used the temporary to hold the byte offset to hold the address. Initially in doing these patches, I reworked the constraints for prefixed and non-prefixed memory so we could identify when we needed a second temporary. Then I realized that eventaully we will want to generate an X-FORM (register + register) address, and it was just simpler to use the 'Q' constraint, and have the register allocator put the address into a register. I have verified that the bug is indeed fixed (patch #15 will include the new tests for this). I have also bootstrapped the compiler on a little endian power8 machine and there were no regressions in the test suite. Can I check this patch into the trunk? 2019-12-20 Michael Meissner * config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator): Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. (vsx_extract_v4sf_var): Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. (vsx_extract__var, VSX_EXTRACT_I iterator):Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. (vsx_extract__mode_var): Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 279597) +++ gcc/config/rs6000/vsx.md(working copy) @@ -3245,10 +3245,11 @@ (define_insn "vsx_vslo_" "vslo %0,%1,%2" [(set_attr "type" "vecperm")]) -;; Variable V2DI/V2DF extract +;; Variable V2DI/V2DF extract. Use 'Q' for the memory because we will +;; ultimately have to convert the address into base + index. (define_insn_and_split "vsx_extract__var" [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r") - (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m") + (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) @@ -3318,7 +3319,7 @@ (define_insn_and_split "*vsx_extract_v4s ;; Variable V4SF extract (define_insn_and_split "vsx_extract_v4sf_var" [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r") - (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m") + (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) @@ -3681,7 +3682,7 @@ (define_insn_and_split "*vsx_extract__var" [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r") (unspec: -[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") +[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,r,&b")) @@ -3701,7 +3702,7 @@ (define_insn_and_split "*vsx_extract_ 0 "gpc_reg_operand" "=r,r,r") (zero_extend: (unspec: - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT))) (clobber (match_scratch:DI 3 "=r,r,&b")) -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #4 of 15, Update 'Q' constraint documentation.
In doing V11 patch #3, I noticed that the documentation for the 'Q' was misleading. This patch updates the documentation. Can I check this patch into the trunk? 2019-12-20 Michael Meissner * config/rs6000/constraints.md (Q constraint): Update documentation. * doc/md.tet (PowerPC constraints): Update 'Q' constraint documentation. Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 279547) +++ gcc/config/rs6000/constraints.md(working copy) @@ -211,8 +211,7 @@ several times, or that might not access (match_test "GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC"))) (define_memory_constraint "Q" - "Memory operand that is an offset from a register (it is usually better -to use @samp{m} or @samp{es} in @code{asm} statements)" + "A memory operand whose address which uses a single register with no offset." (and (match_code "mem") (match_test "REG_P (XEXP (op, 0))"))) Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 279547) +++ gcc/doc/md.texi (working copy) @@ -3381,8 +3381,7 @@ allowed when @samp{<} or @samp{>} is use as @samp{m} without @samp{<} and @samp{>}. @item Q -Memory operand that is an offset from a register (it is usually better -to use @samp{m} or @samp{es} in @code{asm} statements) +A memory operand whose address which uses a single register with no offset. @item Z Memory operand that is an indexed or indirect from a register (it is -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address
This patch recognizes when we are doing the optimization of vector extract with a constant element number when the vector is in memory and the vector's address is PC-relative, to directly re-form the address using a PC-relative load, instead of loading the address into a temporary register, and then doing an indirect load. I have bootstrapped a compiler on a little endian power8 machine and ran the testsuite with no regressions. Can I check this into the trunk? 2019-12-20 Michael Meissner * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper function to identify the address mask of a hard register. (rs6000_adjust_vec_address): If we have a PC-relative address and a constant vector element number, fold the element number into the PC-relative address. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279597) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6722,6 +6722,30 @@ rs6000_expand_vector_extract (rtx target } } +/* Helper function to return an address mask based on a physical register. */ + +static addr_mask_type +rs6000_reg_to_addr_mask (rtx reg, machine_mode mode) +{ + unsigned int r = reg_or_subregno (reg); + addr_mask_type addr_mask; + + gcc_assert (HARD_REGISTER_NUM_P (r)); + if (INT_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR]; + + else if (FP_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR]; + + else if (ALTIVEC_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX]; + + else +gcc_unreachable (); + + return addr_mask; +} + /* Adjust a memory address (MEM) of a vector type to point to a scalar field within the vector (ELEMENT) with a mode (SCALAR_MODE). Use a base register temporary (BASE_TMP) to fixup the address. Return the new memory address @@ -6854,6 +6878,51 @@ rs6000_adjust_vec_address (rtx scalar_re } } + /* For references to local static variables, try to fold a constant offset + into the address. */ + else if (pcrel_local_address (addr, Pmode) && CONST_INT_P (element_offset)) +{ + if (GET_CODE (addr) == CONST) + addr = XEXP (addr, 0); + + if (GET_CODE (addr) == PLUS) + { + rtx op0 = XEXP (addr, 0); + rtx op1 = XEXP (addr, 1); + if (CONST_INT_P (op1)) + { + HOST_WIDE_INT offset + = INTVAL (XEXP (addr, 1)) + INTVAL (element_offset); + + if (offset == 0) + new_addr = op0; + + else if (SIGNED_INTEGER_34BIT_P (offset)) + { + rtx plus = gen_rtx_PLUS (Pmode, op0, GEN_INT (offset)); + new_addr = gen_rtx_CONST (Pmode, plus); + } + + else + { + emit_move_insn (base_tmp, addr); + new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); + } + } + else + { + emit_move_insn (base_tmp, addr); + new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); + } + } + + else + { + rtx plus = gen_rtx_PLUS (Pmode, addr, element_offset); + new_addr = gen_rtx_CONST (Pmode, plus); + } +} + else { /* Make sure base_tmp is not the same as element_offset. This can happen @@ -6869,21 +6938,8 @@ rs6000_adjust_vec_address (rtx scalar_re if (GET_CODE (new_addr) == PLUS) { rtx op1 = XEXP (new_addr, 1); - addr_mask_type addr_mask; - unsigned int scalar_regno = reg_or_subregno (scalar_reg); - - gcc_assert (HARD_REGISTER_NUM_P (scalar_regno)); - if (INT_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_GPR]; - - else if (FP_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_FPR]; - - else if (ALTIVEC_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_VMX]; - - else - gcc_unreachable (); + addr_mask_type addr_mask + = rs6000_reg_to_addr_mask (scalar_reg, scalar_mode); if (REG_P (op1) || SUBREG_P (op1)) valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0; @@ -6891,9 +6947,21 @@ rs6000_adjust_vec_address (rtx scalar_re valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0; } + /* An address that is a single register is always valid for either indexed or + offsettable loads. */ else if (REG_P (new_addr) || SUBREG_P (new_addr)) valid_addr_p = true; + /* If we have a PC-relative address, check if offsetable loads are + allowed. */ + else if (pcrel_local_address (new_addr, Pmode)) +{ + addr_mask_type addr_mask + = rs6000_reg_to_addr_mask (scalar_reg, scalar_mode); + + valid_addr_p = (addr_mask & RELOAD_REG_OFFSE
[PATCH] V11 patch #6 of 15, Make -mpcrel the default for -mcpu=future on Linux 64-bit
This is the same as V10 patch #8. Once the vector extract patches are committed, this patch flips the default to use PC-relative addressing on 64-bit Linux systems when the uses -mcpu=future. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00841.html I have bootstrapped the compiler on a little endian power8 system and ran the testsuite with no regressions. Once the preceeding V11 patches have been checked in, can I check these patches into the trunk? 2019-12-20 Michael Meissner * config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to 1 to enable prefixed addressing if -mcpu=future. (PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing if -mcpu=future. * config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not enable -mprefixed-addr or -mpcrel by default. (ADDRESSING_FUTURE_MASKS): New macro. (OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS. * config/rs6000/rs6000.c (PREFIXED_ADDR_SUPPORTED_BY_OS): Disable prefixed addressing unless the target OS tm.h says we should enable it. (PCREL_SUPPORTED_BY_OS): Disable PC-relative addressing unless the target OS tm.h says we should enable it. (rs6000_debug_reg_global): Print whether prefixed addressing and PC-relative addressing is enabled by default if -mcpu=future. (rs6000_option_override_internal): Move setting prefixed addressing and PC-relative addressing after the sub-target option handling is done. Only enable prefixed addressing or PC-relative address on -mcpu=future system if the target OS says to enable it. Disallow prefixed addressing on 32-bit systems or if the target object file is not ELF v2. Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 279141) +++ gcc/config/rs6000/linux64.h (working copy) @@ -640,3 +640,11 @@ extern int dot_symbols; enabling the __float128 keyword. */ #undef TARGET_FLOAT128_ENABLE_TYPE #define TARGET_FLOAT128_ENABLE_TYPE 1 + +/* Enable support for pc-relative and numeric prefixed addressing on the + 'future' system. */ +#undef PREFIXED_ADDR_SUPPORTED_BY_OS +#define PREFIXED_ADDR_SUPPORTED_BY_OS 1 + +#undef PCREL_SUPPORTED_BY_OS +#define PCREL_SUPPORTED_BY_OS 1 Index: gcc/config/rs6000/rs6000-cpus.def === --- gcc/config/rs6000/rs6000-cpus.def (revision 279141) +++ gcc/config/rs6000/rs6000-cpus.def (working copy) @@ -75,15 +75,22 @@ | OPTION_MASK_P8_VECTOR\ | OPTION_MASK_P9_VECTOR) -/* Support for a future processor's features. Do not enable -mpcrel until it - is fully functional. */ +/* Support for a future processor's features. The prefixed and pc-relative + addressing bits are not added here. Instead, they are added if the target + OS tm.h says that it supports the addressing modes by default when + -mcpu=future is used. */ #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER \ -| OPTION_MASK_FUTURE \ +| OPTION_MASK_FUTURE) + +/* Addressing related flags on a future processor. These are options that need + to be cleared if the target OS is not capable of supporting prefixed + addressing at all (such as 32-bit mode or if the object file format is not + ELF v2). */ +#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL \ | OPTION_MASK_PREFIXED_ADDR) /* Flags that need to be turned off if -mno-future. */ -#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL \ -| OPTION_MASK_PREFIXED_ADDR) +#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279202) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -98,6 +98,16 @@ #endif #endif +/* Set up the defaults for whether prefixed addressing is used, and if it is + used, whether we want to turn on pc-relative support by default. */ +#ifndef PREFIXED_ADDR_SUPPORTED_BY_OS +#define PREFIXED_ADDR_SUPPORTED_BY_OS 0 +#endif + +#ifndef PCREL_SUPPORTED_BY_OS +#define PCREL_SUPPORTED_BY_OS 0 +#endif + /* Support targetm.vectorize.builtin_mask_for_load. */ GTY(()) tree altivec_builtin_mask_for_load; @@ -2535,6 +2545,14 @@ rs6000_debug_reg_global (void) if (TARGET_DIRECT_MOVE_128) fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld el
[PATCH] V11 patch #7 of 15, Add new target_supports cases for -mcpu=future tests.
This is V10 patch #9. It adds new target_supports tests for the new patches: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00842.html All of the new tests work with these target supports. Can I check it into the trunk? 2019-12-20 Michael Meissner * lib/target-supports.exp (check_effective_target_powerpc_pcrel): New target for PowerPC -mcpu=future support. (check_effective_target_powerpc_prefixed_addr): New target for PowerPC -mcpu=future support. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 279547) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2161,6 +2161,23 @@ proc check_p9modulo_hw_available { } { }] } +# Return 1 if the target generates PC-relative instructions automatically +proc check_effective_target_powerpc_pcrel { } { +return [check_no_messages_and_pattern powerpc_pcrel \ + {\mpld\M.*[@]pcrel} assembly { + static long s; + long *p = &s; + long foo (void) { return s; } + } {-O2 -mcpu=future}] +} + +# Return 1 if the target generates prefixed instructions automatically +proc check_effective_target_powerpc_prefixed_addr { } { +return [check_no_messages_and_pattern powerpc_prefixed_addr \ + {\mpld\M} assembly { + long foo (long *p) { return p[0x12345]; } + } {-O2 -mcpu=future}] +} # Return 1 if the target supports executing FUTURE instructions, 0 otherwise. # Cache the result. It is assumed that if a simulator does not support the -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #8 of 15, Add new tests for using PADDI and PLI with -mcpu=future
This is V10 patch #10. It adds 3 new tests to verify that we generate PADDI/PLI for large constants when -mcpu=future is used. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00843.html This test passes when the preceeding patches are applied. Can I check this in? 2019-12-20 Michael Meissner * gcc.target/powerpc/prefix-add.c: New test for -mcpu=future generating PADDI for large constant adds. * gcc.target/powerpc/prefix-di-constant.c: New test for -mcpu=future generating PLI to load up large DImode constants. * gcc.target/powerpc/prefix-si-constant.c: New test for -mcpu=future generating PLI to load up large SImode constants. Index: gcc/testsuite/gcc.target/powerpc/prefix-add.c === --- gcc/testsuite/gcc.target/powerpc/prefix-add.c (revision 279252) +++ gcc/testsuite/gcc.target/powerpc/prefix-add.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PADDI is generated to add a large constant. */ +unsigned long +add (unsigned long a) +{ + return a + 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpaddi\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c === --- gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c (revision 279252) +++ gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant. */ +unsigned long +large (void) +{ + return 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c === --- gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c (revision 279252) +++ gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant for SImode. */ +void +large_si (unsigned int *p) +{ + *p = 0x12345U; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #9 of 15, Add test to validate generating prefixed memory when the offset is invalid for DS/DQ insns
This is V10 patch #11. This adds a new test to validate that for -mcpu=future, we generate a prefixed load/store if the offset would have been illegal for a non-prefixed DS or DQ instruction. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00845.html This test passes when I run the testsuite. Can I check it in? 2019-12-20 Michael Meissner * gcc.target/powerpc/prefix-ds-dq.c: New test to verify that we generate the prefix load/store instructions for traditional instructions with an offset that doesn't match DS/DQ requirements. Index: gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c === --- gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (revision 279256) +++ gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (working copy) @@ -0,0 +1,156 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests whether we generate a prefixed load/store operation for addresses that + don't meet DS/DQ offset constraints. */ + +unsigned long +load_uc_offset1 (unsigned char *p) +{ + return p[1]; /* should generate LBZ. */ +} + +long +load_sc_offset1 (signed char *p) +{ + return p[1]; /* should generate LBZ + EXTSB. */ +} + +unsigned long +load_us_offset1 (unsigned char *p) +{ + return *(unsigned short *)(p + 1); /* should generate LHZ. */ +} + +long +load_ss_offset1 (unsigned char *p) +{ + return *(short *)(p + 1);/* should generate LHA. */ +} + +unsigned long +load_ui_offset1 (unsigned char *p) +{ + return *(unsigned int *)(p + 1); /* should generate LWZ. */ +} + +long +load_si_offset1 (unsigned char *p) +{ + return *(int *)(p + 1); /* should generate PLWA. */ +} + +unsigned long +load_ul_offset1 (unsigned char *p) +{ + return *(unsigned long *)(p + 1);/* should generate PLD. */ +} + +long +load_sl_offset1 (unsigned char *p) +{ + return *(long *)(p + 1); /* should generate PLD. */ +} + +float +load_float_offset1 (unsigned char *p) +{ + return *(float *)(p + 1);/* should generate LFS. */ +} + +double +load_double_offset1 (unsigned char *p) +{ + return *(double *)(p + 1); /* should generate LFD. */ +} + +__float128 +load_float128_offset1 (unsigned char *p) +{ + return *(__float128 *)(p + 1); /* should generate PLXV. */ +} + +void +store_uc_offset1 (unsigned char uc, unsigned char *p) +{ + p[1] = uc; /* should generate STB. */ +} + +void +store_sc_offset1 (signed char sc, signed char *p) +{ + p[1] = sc; /* should generate STB. */ +} + +void +store_us_offset1 (unsigned short us, unsigned char *p) +{ + *(unsigned short *)(p + 1) = us; /* should generate STH. */ +} + +void +store_ss_offset1 (signed short ss, unsigned char *p) +{ + *(signed short *)(p + 1) = ss; /* should generate STH. */ +} + +void +store_ui_offset1 (unsigned int ui, unsigned char *p) +{ + *(unsigned int *)(p + 1) = ui; /* should generate STW. */ +} + +void +store_si_offset1 (signed int si, unsigned char *p) +{ + *(signed int *)(p + 1) = si; /* should generate STW. */ +} + +void +store_ul_offset1 (unsigned long ul, unsigned char *p) +{ + *(unsigned long *)(p + 1) = ul; /* should generate PSTD. */ +} + +void +store_sl_offset1 (signed long sl, unsigned char *p) +{ + *(signed long *)(p + 1) = sl;/* should generate PSTD. */ +} + +void +store_float_offset1 (float f, unsigned char *p) +{ + *(float *)(p + 1) = f; /* should generate STF. */ +} + +void +store_double_offset1 (double d, unsigned char *p) +{ + *(double *)(p + 1) = d; /* should generate STD. */ +} + +void +store_float128_offset1 (__float128 f128, unsigned char *p) +{ + *(__float128 *)(p + 1) = f128; /* should generate PSTXV. */ +} + +/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlbz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlha\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlhz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlwz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mplwa\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mplxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\msth\M} 2 } } */ +/* { dg-final { scan-assembler
[PATCH] V11 patch #10 of 15, Make sure we don't generate pre-modify prefixed insns with -mcpu=future
This is V10 patch #12. It adds a test to make sure we don't generate a prefixed instruction with PRE_INC, PRE_DEC, or PRE_MODIFY. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00846.html This test passes when I run it. Can I check this into the trunk? 2019-12-20 Michael Meissner * gcc.target/powerpc/prefix-no-premodify.c: Make sure we do not generate the non-existent PLWZU instruction if -mcpu=future. Index: gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c === --- gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c (revision 279259) +++ gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c (working copy) @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Make sure that we don't generate a prefixed form of the load and store with + update instructions (i.e. instead of generating LWZU we have to generate + PLWZ plus a PADDI). */ + +#ifndef SIZE +#define SIZE 5 +#endif + +struct foo { + unsigned int field; + char pad[SIZE]; +}; + +struct foo *inc_load (struct foo *p, unsigned int *q) +{ + *q = (++p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *dec_load (struct foo *p, unsigned int *q) +{ + *q = (--p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *inc_store (struct foo *p, unsigned int *q) +{ + (++p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +struct foo *dec_store (struct foo *p, unsigned int *q) +{ + (--p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +/* { dg-final { scan-assembler-times {\mlwz\M}2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M}2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstw\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mplwzu\M}} } */ +/* { dg-final { scan-assembler-not {\mpstwu\M}} } */ +/* { dg-final { scan-assembler-not {\maddis\M}} } */ +/* { dg-final { scan-assembler-not {\maddi\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #11 of 15, Add new tests for generating prefixed loads/stores on -mcpu=future with large offsets
This is a reworking of the tests I submitted previously in V8 #4. It generates a bunch of loads and stores for various types using large addresses, and verifies that the number of prefixed loads and stores is correct. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00084.html This patch works when I run the testsuite. Can I check it in? 2019-12-20 Michael Meissner * gcc.target/powerpc/prefix-large.h: New set of tests to test prefixed addressing on 'future' system with large numeric offsets for various types. * gcc.target/powerpc/prefix-large-dd.c: New test for prefixed loads/stores with large offsets for the _Decimal64 type. * gcc.target/powerpc/prefix-large-df.c: New test for prefixed loads/stores with large offsets for the double type. * gcc.target/powerpc/prefix-large-di.c: New test for prefixed loads/stores with large offsets for the long type. * gcc.target/powerpc/prefix-large-hi.c: New test for prefixed loads/stores with large offsets for the short type. * gcc.target/powerpc/prefix-large-kf.c: New test for prefixed loads/stores with large offsets for the __float128 type. * gcc.target/powerpc/prefix-large-qi.c: New test for prefixed loads/stores with large offsets for the signed char type. * gcc.target/powerpc/prefix-large-sd.c: New test for prefixed loads/stores with large offsets for the _Decimal32 type. * gcc.target/powerpc/prefix-large-sf.c: New test for prefixed loads/stores with large offsets for the float type. * gcc.target/powerpc/prefix-large-si.c: New test for prefixed loads/stores with large offsets for the int type. * gcc.target/powerpc/prefix-large-udi.c: New test for prefixed loads/stores with large offsets for the unsigned long type. * gcc.target/powerpc/prefix-large-uhi.c: New test for prefixed loads/stores with large offsets for the unsigned short type. * gcc.target/powerpc/prefix-large-uqi.c: New test for prefixed loads/stores with large offsets for the unsigned char type. * gcc.target/powerpc/prefix-large-usi.c: New test for prefixed loads/stores with large offsets for the unsigned int type. * gcc.target/powerpc/prefix-large-v2df.c: New test for prefixed loads/stores with large offsets for the vector double type. Index: gcc/testsuite/gcc.target/powerpc/prefix-large.h === --- gcc/testsuite/gcc.target/powerpc/prefix-large.h (revision 279319) +++ gcc/testsuite/gcc.target/powerpc/prefix-large.h (working copy) @@ -0,0 +1,59 @@ +/* Common tests for prefixed instructions testing whether we can generate a + 34-bit offset using 1 instruction. */ + +typedef signed charschar; +typedef unsigned char uchar; +typedef unsigned short ushort; +typedef unsigned int uint; +typedef unsigned long ulong; +typedef long doubleldouble; +typedef vector double v2df; +typedef vector longv2di; +typedef vector float v4sf; +typedef vector int v4si; + +#ifndef TYPE +#define TYPE ulong +#endif + +#ifndef ITYPE +#define ITYPE TYPE +#endif + +#ifndef OTYPE +#define OTYPE TYPE +#endif + +#if !defined(DO_ADD) && !defined(DO_VALUE) && !defined(DO_SET) +#define DO_ADD 1 +#define DO_VALUE 1 +#define DO_SET 1 +#endif + +#ifndef CONSTANT +#define CONSTANT 0x123450UL +#endif + +#if DO_ADD +void +add (TYPE *p, TYPE a) +{ + p[CONSTANT] += a; +} +#endif + +#if DO_VALUE +OTYPE +value (TYPE *p) +{ + return p[CONSTANT]; +} +#endif + +#if DO_SET +void +set (TYPE *p, ITYPE a) +{ + p[CONSTANT] = a; +} +#endif Index: gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c === --- gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c (revision 279319) +++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset for _Decimal64 objects. */ + +#define TYPE _Decimal64 + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-large-df.c === --- gcc/testsuite/gcc.target/powerpc/prefix-large-df.c (revision 279319) +++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed
[PATCH] V11 patch #12 of 15, Add new PC-relative tests for -mcpu=future
This is a reworking of patch V8 #5. It adds a bunch of PC-relative tests for the -mcpu=future target. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00085.html This test passes when I run it. Can I check it in? 2019-12-20 Michael Meissner * gcc.target/powerpc/prefix-pcrel.h: New set of tests to test prefixed addressing on 'future' system with PC-relative addresses for various types. * gcc.target/powerpc/prefix-pcrel-dd.c: New test for prefixed loads/stores with PC-relative addresses for the _Decimal64 type. * gcc.target/powerpc/prefix-pcrel-df.c: New test for prefixed loads/stores with PC-relative addresses for the double type. * gcc.target/powerpc/prefix-pcrel-di.c: New test for prefixed loads/stores with PC-relative addresses for the long type. * gcc.target/powerpc/prefix-pcrel-hi.c: New test for prefixed loads/stores with PC-relative addresses for the short type. * gcc.target/powerpc/prefix-pcrel-kf.c: New test for prefixed loads/stores with PC-relative addresses for the __float128 type. * gcc.target/powerpc/prefix-pcrel-qi.c: New test for prefixed loads/stores with PC-relative addresses for the signed char type. * gcc.target/powerpc/prefix-pcrel-sd.c: New test for prefixed loads/stores with PC-relative addresses for the _Decimal32 type. * gcc.target/powerpc/prefix-pcrel-sf.c: New test for prefixed loads/stores with PC-relative addresses for the float type. * gcc.target/powerpc/prefix-pcrel-si.c: New test for prefixed loads/stores with PC-relative addresses for the int type. * gcc.target/powerpc/prefix-pcrel-udi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned long type. * gcc.target/powerpc/prefix-pcrel-uhi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned short type. * gcc.target/powerpc/prefix-pcrel-uqi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned char type. * gcc.target/powerpc/prefix-pcrel-usi.c: New test for prefixed loads/stores with PC-relative addresses for the unsigned int type. * gcc.target/powerpc/prefix-pcrel-v2df.c: New test for prefixed loads/stores with PC-relative addresses for the vector double type. Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h === --- gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h (revision 279322) +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h (working copy) @@ -0,0 +1,58 @@ +/* Common tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for each type. */ + +typedef signed charschar; +typedef unsigned char uchar; +typedef unsigned short ushort; +typedef unsigned int uint; +typedef unsigned long ulong; +typedef long doubleldouble; +typedef vector double v2df; +typedef vector longv2di; +typedef vector float v4sf; +typedef vector int v4si; + +#ifndef TYPE +#define TYPE ulong +#endif + +#ifndef ITYPE +#define ITYPE TYPE +#endif + +#ifndef OTYPE +#define OTYPE TYPE +#endif + +static TYPE a; +TYPE *p = &a; + +#if !defined(DO_ADD) && !defined(DO_VALUE) && !defined(DO_SET) +#define DO_ADD 1 +#define DO_VALUE 1 +#define DO_SET 1 +#endif + +#if DO_ADD +void +add (TYPE b) +{ + a += b; +} +#endif + +#if DO_VALUE +OTYPE +value (void) +{ + return (OTYPE)a; +} +#endif + +#if DO_SET +void +set (ITYPE b) +{ + a = (TYPE)b; +} +#endif Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c === --- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c (revision 279322) +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for the _Decimal64 type. */ + +#define TYPE _Decimal64 + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {[@]pcrel} 4 } } */ +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c === --- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c (revision 279322) +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructio
[PATCH] V11 patch #13 of 15, Add test for -mcpu=future -fstack-protect-strong with large stacks
This is patch V8 #6. It makes sure the stack protect insns work when -mcpu=future and -fstack-protector-strong are used together. We discovered this failure when we attempted to build GLIBC using -mcpu=future. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00089.html This test now passes when I run it as part of the test suite, can I check it in to the trunk? 2019-12-20 Michael Meissner * gcc.target/powerpc/prefix-stack-protect.c: New test to make sure -fstack-protect-strong works with prefixed addressing. Index: gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c === --- gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (revision 279324) +++ gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */ + +/* Test that we can handle large stack frames with -fstack-protector-strong and + prefixed addressing. This was originally discovered in trying to build + glibc with -mcpu=future, and vfwprintf.c failed because it used + -fstack-protector-strong. */ + +extern long foo (char *); + +long +bar (void) +{ + char buffer[0x2]; + return foo (buffer) + 1; +} + +/* { dg-final { scan-assembler {\mpld\M} } } */ +/* { dg-final { scan-assembler {\mpstd\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #14 of 15, Add tests for vec_extract from memory with PC-relative addrss
These tests are new. These tests check that the vector extract from a vector in memory works correctly for both constant and variable element numbers. These tests pass with all of the previoius pataches applied. Can I check these patches into the trunk? 2019-12-20 Michael Meissner * gcc.target/powerpc/vec-extract-pcrel-si.c: New test for vec_extract from a PC-relative address. * gcc.target/powerpc/vec-extract-pcrel-di.c: New test for vec_extract from a PC-relative address. * gcc.target/powerpc/vec-extract-pcrel-sf.c: New test for vec_extract from a PC-relative address. * gcc.target/powerpc/vec-extract-pcrel-df.c: New test for vec_extract from a PC-relative address. Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (revision 279615) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V2DF vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE double +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (revision 279615) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V2DI vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE unsigned long +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (revision 279615) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V4SF vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE float +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mplfs\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (revision 279615) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we can support vec_extract on V4SI vectors with a PC-relative + address. */ + +#include + +#ifndef TYPE +#define TYPE unsigned int +#endif + +static vector TYPE v; +vector TYPE *p = &v; + +TYPE +get0 (void) +{ + return vec_extract (v, 0); +} + +TYPE +get1 (void) +{ + return vec_extract (v, 1); +} + +TYPE +getn (unsigned long n) +{ + return vec_extract (v, n); +} + +/* { dg-final { scan-assembler-times {[@]pcrel} 3 } } */ +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpla\M} 1 } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V11 patch #15 of 15, Add tests for -mcpu=future vec_extract from memory with a large offset
These are new tests. They verify if you are doing a vec_extract of a vector in memory and the vector's address contains a large offset and the element number is constant, it generates a prefixed load instruction when -mcpu=future. Once all of the other V11 patches are checked in, can I check this patch into the trunk? 2019-12-20 Michael Meissner * gcc.target/powerpc/vec-extract-large-si.c: New test for vec_extract from a vector unsigned int in memory with a large offset. * gcc.target/powerpc/vec-extract-large-di.c: New test for vec_extract from a vector long in memory with a large offset. * gcc.target/powerpc/vec-extract-large-sf.c: New test for vec_extract from a vector float in memory with a large offset. * gcc.target/powerpc/vec-extract-large-df.c: New test for vec_extract from a vector double in memory with a large offset. Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (revision 279691) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we generate prefixed loads for vec_extract of a vector double in + memory, and the memory address has a large offset. */ + +#include + +#ifndef TYPE +#define TYPE double +#endif + +#ifndef LARGE +#define LARGE 0x5 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 0);/* PLFD. */ +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 1);/* PLFD. */ +} + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (revision 279691) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we generate prefixed loads for vec_extract of a vector unsigned long + in memory, and the memory address has a large offset. */ + +#include + +#ifndef TYPE +#define TYPE unsigned long +#endif + +#ifndef LARGE +#define LARGE 0x5 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 0);/* PLD. */ +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 1);/* PLD. */ +} + +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (revision 279691) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we generate prefixed loads for vec_extract of a vector float in + memory, and the memory address has a large offset. */ + +#include + +#ifndef TYPE +#define TYPE float +#endif + +#ifndef LARGE +#define LARGE 0x5 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 0);/* PLFS. */ +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 1);/* PLFS. */ +} + +/* { dg-final { scan-assembler-times {\mplfs\M} 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (revision 279691) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test if we generate prefixed loads for vec_extract of a vector unsigned int + in memory, and the memory address has a large offset. */ + +#include + +#ifndef TYPE +#define TYPE unsigned int +#endif + +#ifndef LARGE +#define LARGE 0x5 +#endif + +TYPE +get0 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 0);/* PLWZ. */ +} + +TYPE +get1 (vector TYPE *p) +{ + return vec_extract (p[LARGE], 1);/* PLWZ. */ +} + +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address
On Tue, Dec 24, 2019 at 10:24:55AM -0600, Segher Boessenkool wrote: > Hi! > > On Fri, Dec 20, 2019 at 06:55:53PM -0500, Michael Meissner wrote: > > * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper > > function to identify the address mask of a hard register. > > Do this as a separate patch please. That refactoring is pre-approved. > Please explain in the function comment what an "address mask" is. Or > better yet, don't call it a "mask", it isn't a mask? It is called mask because everywhere else in rs6000.c uses 'addr_mask' or just mask. It is a mask of valid bits. > Also various of the names here still have "reload" in it, which doesn't > really make much sense. When these functions were written, it was in the context of supporting the secondary reload functions, and so reload was in the name. I will make a refactoring patch that uses the current names. If we want to change all of the uses we can in a future patch. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address
On Tue, Dec 24, 2019 at 10:24:55AM -0600, Segher Boessenkool wrote: > Hi! > > On Fri, Dec 20, 2019 at 06:55:53PM -0500, Michael Meissner wrote: > > * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper > > function to identify the address mask of a hard register. > > Do this as a separate patch please. That refactoring is pre-approved. > Please explain in the function comment what an "address mask" is. Or > better yet, don't call it a "mask", it isn't a mask? > > Also various of the names here still have "reload" in it, which doesn't > really make much sense. > > rs6000_mode_to_addressing_flags? And a reg_to for this new one? > Something like that. Note, rs6000_mode_to_addressing_flags also does not fit the usage. The key is to return the address mask of the valid addressing options that needs both a hard register and a mode. Mode by itself is not useful, since loading up SImode to vector registers requires X_FORM, while then same mode in GPR registers can of course do D_FORM and X_FORM addressing. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH, committed] V11 patch #2 of 15, Use prefixed load for vector extract with large offset
On Sun, Dec 22, 2019 at 11:10:09AM -0600, Segher Boessenkool wrote: > The patch is okay for trunk (with the comment moved, and the rtx_equal_p > fixed). Thanks! Here is the patch I committed (subversion id 279937): 2020-01-06 Michael Meissner * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support for the offset being 34-bits when -mcpu=future is used. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279910) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6797,11 +6797,19 @@ rs6000_adjust_vec_address (rtx scalar_re HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset); rtx offset_rtx = GEN_INT (offset); - if (IN_RANGE (offset, -32768, 32767) + /* 16-bit offset. */ + if (SIGNED_INTEGER_16BIT_P (offset) && (scalar_size < 8 || (offset & 0x3) == 0)) new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + + /* 34-bit offset if we have prefixed addresses. */ + else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (offset)) + new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + else { + /* Offset overflowed, move offset to the temporary (which will +likely be split), and do X-FORM addressing. */ emit_move_insn (base_tmp, offset_rtx); new_addr = gen_rtx_PLUS (Pmode, op0, base_tmp); } @@ -6830,6 +6838,12 @@ rs6000_adjust_vec_address (rtx scalar_re emit_insn (insn); } + /* Make sure we don't overwrite the temporary if the element being +extracted is variable, and we've put the offset into base_tmp +previously. */ + else if (reg_mentioned_p (base_tmp, element_offset)) + emit_insn (gen_add2_insn (base_tmp, op1)); + else { emit_move_insn (base_tmp, op1); -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH, committed] V11 patch #3 of 15, Use 'Q' constraint for variable vector extract from memory
On Sun, Dec 22, 2019 at 11:24:51AM -0600, Segher Boessenkool wrote: > Hi! > > On Fri, Dec 20, 2019 at 06:47:28PM -0500, Michael Meissner wrote: > > Then I realized that eventaully we will want to generate an X-FORM > > (register + > > register) address, and it was just simpler to use the 'Q' constraint, and > > have > > the register allocator put the address into a register. > > Yep, good call. > > > * config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator): > > Use 'Q' for memory constraints because we need to do an X-FORM > > load with the variable index. > > (vsx_extract_v4sf_var): Use 'Q' for memory constraints because we > > need to do an X-FORM load with the variable index. > > This comment is a headscratcher -- but you shouldn't say "why" in > changelogs at all, so that is an easy fix ;-) > > > (vsx_extract__var, VSX_EXTRACT_I iterator):Use 'Q' for > > (missing space) > > > memory constraints because we need to do an X-FORM load with the > > variable index. > > (vsx_extract__mode_var): Use 'Q' for memory > > constraints because we need to do an X-FORM load with the variable > > index. > > (and more) > > > -;; Variable V2DI/V2DF extract > > +;; Variable V2DI/V2DF extract. Use 'Q' for the memory because we will > > +;; ultimately have to convert the address into base + index. > > Maybe just don't write anything at all, since it is hard to explain in a > few words? It is clear that "Q" is not a usual constraint, anyway :-) > > Okay for trunk like that. Thanks! This is the patch I committed (subversion id 279938): 2020-01-06 Michael Meissner * config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator): Use 'Q' for doing vector extract from memory. (vsx_extract_v4sf_var): Use 'Q' for doing vector extract from memory. (vsx_extract__var, VSX_EXTRACT_I iterator): Use 'Q' for doing vector extract from memory. (vsx_extract__mode_var): Use 'Q' for doing vector extract from memory. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 279910) +++ gcc/config/rs6000/vsx.md(working copy) @@ -3248,7 +3248,7 @@ (define_insn "vsx_vslo_" ;; Variable V2DI/V2DF extract (define_insn_and_split "vsx_extract__var" [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r") - (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m") + (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) @@ -3318,7 +3318,7 @@ (define_insn_and_split "*vsx_extract_v4s ;; Variable V4SF extract (define_insn_and_split "vsx_extract_v4sf_var" [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r") - (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m") + (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) @@ -3681,7 +3681,7 @@ (define_insn_and_split "*vsx_extract__var" [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r") (unspec: -[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") +[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,r,&b")) @@ -3701,7 +3701,7 @@ (define_insn_and_split "*vsx_extract_ 0 "gpc_reg_operand" "=r,r,r") (zero_extend: (unspec: - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT))) (clobber (match_scratch:DI 3 "=r,r,&b")) -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH, committed] V11 patch #4 of 15, Update 'Q' constraint documentation.
On Sun, Dec 22, 2019 at 11:49:19AM -0600, Segher Boessenkool wrote: > On Fri, Dec 20, 2019 at 06:49:30PM -0500, Michael Meissner wrote: > > In doing V11 patch #3, I noticed that the documentation for the 'Q' was > > misleading. > > It originally was used just for lswi/stswi, which can access up to the > first 32 bytes of storage pointed to by the register. But yes, the > current comment is confusing. > > > * config/rs6000/constraints.md (Q constraint): Update > > documentation. > > * doc/md.tet (PowerPC constraints): Update 'Q' constraint > > documentation. > > "md.tet"? That's an interesting typo :-) > > > (define_memory_constraint "Q" > > - "Memory operand that is an offset from a register (it is usually better > > -to use @samp{m} or @samp{es} in @code{asm} statements)" > > + "A memory operand whose address which uses a single register with no > > offset." > > Arm has > > (define_memory_constraint "Q" > "@internal > An address that is a single base register." > (and (match_code "mem") > (match_test "REG_P (XEXP (op, 0))"))) > > which is more correct for us (the register cannot be r0!) > > But it is not an address. > > Maybe "A memory operand addressed by just a base register." ? > > Okay for trunk like that. Thanks! This is the patch I committed (subversion ids 279939 and 279940). 2020-01-06 Michael Meissner * config/rs6000/constraints.md (Q constraint): Update documentation. * doc/md.texi (RS/6000 constraints): Update 'Q' cosntraint documentation. Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 279910) +++ gcc/config/rs6000/constraints.md(working copy) @@ -211,8 +211,7 @@ several times, or that might not access (match_test "GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC"))) (define_memory_constraint "Q" - "Memory operand that is an offset from a register (it is usually better -to use @samp{m} or @samp{es} in @code{asm} statements)" + "A memory operand addressed by just a base register." (and (match_code "mem") (match_test "REG_P (XEXP (op, 0))"))) Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 279910) +++ gcc/doc/md.texi (working copy) @@ -3381,8 +3381,7 @@ allowed when @samp{<} or @samp{>} is use as @samp{m} without @samp{<} and @samp{>}. @item Q -Memory operand that is an offset from a register (it is usually better -to use @samp{m} or @samp{es} in @code{asm} statements) +A memory operand addressed by just a base register. @item Z Memory operand that is an indexed or indirect from a register (it is -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH, committed] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address
On Tue, Dec 24, 2019 at 10:24:55AM -0600, Segher Boessenkool wrote: > Hi! > > On Fri, Dec 20, 2019 at 06:55:53PM -0500, Michael Meissner wrote: > > * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper > > function to identify the address mask of a hard register. > > Do this as a separate patch please. That refactoring is pre-approved. > Please explain in the function comment what an "address mask" is. Or > better yet, don't call it a "mask", it isn't a mask? I committed this patch for the refactoring (subversion id 279941). I will submit the other pieces later. 2020-01-06 Michael Meissner * config/rs6000/rs6000.c (hard_reg_and_mode_to_addr_mask): New helper function to return the valid addressing formats for a given hard register and mode. (rs6000_adjust_vec_address): Call hard_reg_and_mode_to_addr_mask. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 279912) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6729,6 +6729,30 @@ rs6000_expand_vector_extract (rtx target } } +/* Helper function to return an address mask based on a physical register. */ + +static addr_mask_type +hard_reg_and_mode_to_addr_mask (rtx reg, machine_mode mode) +{ + unsigned int r = reg_or_subregno (reg); + addr_mask_type addr_mask; + + gcc_assert (HARD_REGISTER_NUM_P (r)); + if (INT_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR]; + + else if (FP_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR]; + + else if (ALTIVEC_REGNO_P (r)) +addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX]; + + else +gcc_unreachable (); + + return addr_mask; +} + /* Adjust a memory address (MEM) of a vector type to point to a scalar field within the vector (ELEMENT) with a mode (SCALAR_MODE). Use a base register temporary (BASE_TMP) to fixup the address. Return the new memory address @@ -6865,21 +6889,8 @@ rs6000_adjust_vec_address (rtx scalar_re if (GET_CODE (new_addr) == PLUS) { rtx op1 = XEXP (new_addr, 1); - addr_mask_type addr_mask; - unsigned int scalar_regno = reg_or_subregno (scalar_reg); - - gcc_assert (HARD_REGISTER_NUM_P (scalar_regno)); - if (INT_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_GPR]; - - else if (FP_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_FPR]; - - else if (ALTIVEC_REGNO_P (scalar_regno)) - addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_VMX]; - - else - gcc_unreachable (); + addr_mask_type addr_mask + = hard_reg_and_mode_to_addr_mask (scalar_reg, scalar_mode); if (REG_P (op1) || SUBREG_P (op1)) valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0; -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH 0/3] Add support for -mcpu=power11
These three patches add support for -mcpu=power11 to the PowerPC GCC compiler. There are 3 patches in the set. I would like to check these patches into GCC 15 ASAP, and back port the patches into GCC 14 after GCC 14.1 ships. I hope to also back port these patches to other active branches after the code goes into GCC 15 and then GCC 14. Patch #1: This patch adds the basic support for power11. * This patch adds the -mcpu=power11. * This patch adds a power11 processor type. * This patch adds a bit to the isa_flags for power11 support. * This patch defines _ARCH_PWR11 if -mcpu=power11 is used. * This patch uses .machine power11 if -mcpu=power11 is used. * This patch passes -mpower11 or -mpwr11 to the assembler. * This patch uses the power10 defaults for power11. * This patch adds AUXV support for power11. Patch #2: This patch adds tuning support for power11, treating power11 like power10 at the current time. Patch #3: This patch adds tests that are run if the assembler supports either -mpower11 (under Linux) or -mpwr11 (under AIX). These patches have been tested with bootstrap builds on a little endian power10 and a big endian power9 system. When the GCC 15 tree opens up for general patches, can I apply this patch? -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH 2/3] Add tuning support for -mcpu=power11
This patch makes -mtune=power11 use the same tuning decisions as -mtune=power10. I have tested this patch on a little endian power10 system and a big endian power9 system. There were no regressions. Can I check this into GCC 15 when it is open for general patches? 2024-03-18 Michael Meissner gcc/ * config/rs6000/power10.md (all reservations): Add power11 as an alternative to power10. --- gcc/config/rs6000/power10.md | 144 +-- 1 file changed, 72 insertions(+), 72 deletions(-) diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md index fcc2199ab29..90312643858 100644 --- a/gcc/config/rs6000/power10.md +++ b/gcc/config/rs6000/power10.md @@ -1,4 +1,4 @@ -;; Scheduling description for the IBM POWER10 processor. +;; Scheduling description for the IBM POWER10 and POWER11 processors. ;; Copyright (C) 2020-2024 Free Software Foundation, Inc. ;; ;; Contributed by Pat Haugen (pthau...@us.ibm.com). @@ -97,12 +97,12 @@ (define_insn_reservation "power10-load" 4 (eq_attr "update" "no") (eq_attr "size" "!128") (eq_attr "prefixed" "no") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_any_power10,LU_power10") (define_insn_reservation "power10-fused-load" 4 (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-prefixed-load" 4 @@ -110,13 +110,13 @@ (define_insn_reservation "power10-prefixed-load" 4 (eq_attr "update" "no") (eq_attr "size" "!128") (eq_attr "prefixed" "yes") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-load-update" 4 (and (eq_attr "type" "load") (eq_attr "update" "yes") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_even_power10,LU_power10+SXU_power10") (define_insn_reservation "power10-fpload-double" 4 @@ -124,7 +124,7 @@ (define_insn_reservation "power10-fpload-double" 4 (eq_attr "update" "no") (eq_attr "size" "64") (eq_attr "prefixed" "no") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_any_power10,LU_power10") (define_insn_reservation "power10-prefixed-fpload-double" 4 @@ -132,14 +132,14 @@ (define_insn_reservation "power10-prefixed-fpload-double" 4 (eq_attr "update" "no") (eq_attr "size" "64") (eq_attr "prefixed" "yes") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-fpload-update-double" 4 (and (eq_attr "type" "fpload") (eq_attr "update" "yes") (eq_attr "size" "64") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_even_power10,LU_power10+SXU_power10") ; SFmode loads are cracked and have additional 3 cycles over DFmode @@ -148,27 +148,27 @@ (define_insn_reservation "power10-fpload-single" 7 (and (eq_attr "type" "fpload") (eq_attr "update" "no") (eq_attr "size" "32") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-fpload-update-single" 7 (and (eq_attr "type" "fpload") (eq_attr "update" "yes") (eq_attr "size" "32") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")) "DU_even_power10,LU_power10+SXU_power10") (define_insn_reservation "power10-vecload" 4 (and (eq_attr "type" "vecload") (eq_attr "size" "!256") - (eq_attr "cpu" "power10")) + (eq_attr "cpu" "power10,power11")
[PATCH 1/3] Add basic support for -mcpu=power11
This patch adds the power11 option to the -mcpu= and -mtune= switches. This patch treats the power11 like a power10 in terms of costs and reassociation width. This patch issues a ".machine power11" to the assembly file if you use -mcpu=power11. This patch defines _ARCH_PWR11 if the user uses -mcpu=power11. This patch allows GCC to be configured with the --with-cpu=power11 and --with-tune=power11 options. This patch passes -mpwr11 to the assembler if the user uses -mcpu=power11. This patch adds support for using "power11" in the __builtin_cpu_is built-in function. I have tested this patch with a bootstrap build on a little endian power10 system and a bootstrap build on a big endian power9 system. There were no regressions. Can I apply this patch when GCC 15 opens up for general patches? 2024-03-18 Michael Meissner gcc/ * config.gcc (rs6000*-*-*, powerpc*-*-*): Add support for power11. * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11. * config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise. * config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise. * config/rs6000/driver-rs6000.cc (asm_names): Likewise. * config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define. * config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define _ARCH_PWR11 if -mcpu=power11. * config/rs6000/rs6000-cpus.def (ISA_POWER11_MASKS_SERVER): New define. (POWERPC_MASKS): Add power11 isa bit. (power11 cpu): Add power11 definition. * config/rs6000/rs6000-opts.h (PROCESSOR_POWER11): Add power11 processor. * config/rs6000/rs6000-string.cc (expand_compare_loop): Likewise. * config/rs6000/rs6000-tables.opt: Regenerate. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Add power11 support. (rs6000_machine_from_flags): Likewise. (rs6000_reassociation_width): Likewise. (rs6000_adjust_cost): Likewise. (rs6000_issue_rate): Likewise. (rs6000_sched_reorder): Likewise. (rs6000_sched_reorder2): Likewise. (rs6000_register_move_cost): Likewise. (rs6000_opt_masks): Likewise. * config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise. * config/rs6000/rs6000.md (cpu attribute): Add power11. * config/rs6000/rs6000.opt (-mpower11): Add internal power11 ISA flag. * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11. --- gcc/config.gcc | 6 -- gcc/config/rs6000/aix71.h | 1 + gcc/config/rs6000/aix72.h | 1 + gcc/config/rs6000/aix73.h | 1 + gcc/config/rs6000/driver-rs6000.cc | 2 ++ gcc/config/rs6000/ppc-auxv.h| 3 +-- gcc/config/rs6000/rs6000-builtin.cc | 1 + gcc/config/rs6000/rs6000-c.cc | 2 ++ gcc/config/rs6000/rs6000-cpus.def | 5 + gcc/config/rs6000/rs6000-opts.h | 3 ++- gcc/config/rs6000/rs6000-string.cc | 1 + gcc/config/rs6000/rs6000-tables.opt | 3 +++ gcc/config/rs6000/rs6000.cc | 32 + gcc/config/rs6000/rs6000.h | 1 + gcc/config/rs6000/rs6000.md | 2 +- gcc/config/rs6000/rs6000.opt| 3 +++ gcc/doc/invoke.texi | 5 +++-- 17 files changed, 56 insertions(+), 16 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 040afabd9ec..f8036b6476e 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -531,7 +531,9 @@ powerpc*-*-*) extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h" extra_headers="${extra_headers} amo.h" case x$with_cpu in - xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower10|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500) + xpowerpc64 | xdefault64 | x6[23]0 | x970 | xG5 | xpower[3456789] \ + | xpower1[01] | xpower6x | xrs64a | xcell | xa2 | xe500mc64 \ + | xe5500 | xe6500) cpu_is_64bit=yes ;; esac @@ -5566,7 +5568,7 @@ case "${target}" in eval "with_$which=405" ;; "" | common | native \ - | power[3456789] | power10 | power5+ | power6x \ + | power[3456789] | power1[01] | power5+ | power6x \ | powerpc | powerpc64 | powerpc64le \ | rs64 \ | 401 | 403 | 405 | 405fp | 440 | 440fp | 464 | 464fp \ diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h index 24bc301e37d..41037b3852d 100644 --- a/gcc/config/rs6000/aix71.h +++ b/gcc/config/rs6000/aix71.h @@ -79,6 +79,7 @@ do { \ #undef ASM_CPU_SPEC #define ASM_CPU_SPEC \ "%{mcpu=native
[PATCH 3/3] Add -mcpu=power11 tests
This patch adds some simple tests for -mcpu=power11 support. In order to run these tests, you need an assembler that supports the appropriate option for supporting the Power11 processor (-mpower11 under Linux or -mpwr11 under AIX). I have tested this patch on a little endian power10 system and a big endian power9 system using the latest binutils which includes support for power11. There were no regressions, and the 3 power11 tests added ran on both systems. Can I check this patch into GCC 15 when it opens up for general patches? 2024-03-18 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/power11-1.c: New test. * gcc.target/powerpc/power11-2.c: Likewise. * gcc.target/powerpc/power11-3.c: Likewise. * lib/target-supports.exp (check_effective_target_power11_ok): Add new effective target. --- gcc/testsuite/gcc.target/powerpc/power11-1.c | 13 + gcc/testsuite/gcc.target/powerpc/power11-2.c | 20 gcc/testsuite/gcc.target/powerpc/power11-3.c | 10 ++ gcc/testsuite/lib/target-supports.exp| 17 + 4 files changed, 60 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/power11-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/power11-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/power11-3.c diff --git a/gcc/testsuite/gcc.target/powerpc/power11-1.c b/gcc/testsuite/gcc.target/powerpc/power11-1.c new file mode 100644 index 000..6a2e802eedf --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/power11-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target power11_ok } */ +/* { dg-options "-mdejagnu-cpu=power11 -O2" } */ + +/* Basic check to see if the compiler supports -mcpu=power11. */ + +#ifndef _ARCH_PWR11 +#error "-mcpu=power11 is not supported" +#endif + +void foo (void) +{ +} diff --git a/gcc/testsuite/gcc.target/powerpc/power11-2.c b/gcc/testsuite/gcc.target/powerpc/power11-2.c new file mode 100644 index 000..7b9904c1d29 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/power11-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target power11_ok } */ +/* { dg-options "-O2" } */ + +/* Check if we can set the power11 target via a target attribute. */ + +__attribute__((__target__("cpu=power9"))) +void foo_p9 (void) +{ +} + +__attribute__((__target__("cpu=power10"))) +void foo_p10 (void) +{ +} + +__attribute__((__target__("cpu=power11"))) +void foo_p11 (void) +{ +} diff --git a/gcc/testsuite/gcc.target/powerpc/power11-3.c b/gcc/testsuite/gcc.target/powerpc/power11-3.c new file mode 100644 index 000..9b2d643cc0f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/power11-3.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target power11_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ + +/* Check if we can set the power11 target via a target_clones attribute. */ + +__attribute__((__target_clones__("cpu=power11,cpu=power9,default"))) +void foo (void) +{ +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 467b539b20d..be80494be80 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -7104,6 +7104,23 @@ proc check_effective_target_power10_ok { } { } } +# Return 1 if this is a PowerPC target supporting -mcpu=power11. + +proc check_effective_target_power11_ok { } { +if { ([istarget powerpc*-*-*]) } { + return [check_no_compiler_messages power11_ok object { + int main (void) { + #ifndef _ARCH_PWR11 + #error "-mcpu=power11 is not supported" + #endif + return 0; + } + } "-mcpu=power11"] +} else { + return 0 +} +} + # Return 1 if this is a PowerPC target supporting -mfloat128 via either # software emulation on power7/power8 systems or hardware support on power9. -- 2.44.0 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com