from:"Michael Meissner"

[PATCH] Cleanup: Replace UNSPEC_COPYSIGN with copysign RTL

2023-09-29 Thread Michael Meissner

When I first implemented COPYSIGN support in the power7 days, we did not have a
copysign RTL insn, so I had to use UNSPEC to represent the copysign
instruction.  This patch removes those UNSPECs, and it uses the native RTL
copysign insn.

I have tested this on both big endian and little endian PowerPC server systems,
and there were no regressions.  Can I check this into the master branch?  Since
it is just a clean-up, I don't see the need to back port it, but it is simple
to do the back port if desired.

2023-09-29  Michael Meissner  

gcc/

* config/rs6000/rs6000.md (UNSPEC_COPYSIGN): Delete.
(copysign3_fcpsg): Use copysign RTL instead of UNSPEC.
(copysign3_hard): Likewise.
(copysign3_soft): Likewise.
* config/rs6000/vector.md (vector_copysign3): Use copysign RTL
instead of UNSPEC.
* config/rs6000/vsx.md (vsx_copysign3): Use copysign RTL instead
of UNSPEC.
---
 gcc/config/rs6000/rs6000.md | 20 
 gcc/config/rs6000/vector.md |  4 ++--
 gcc/config/rs6000/vsx.md|  7 +++
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 7b583d7a69a..1b6b6cb5bbe 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -108,7 +108,6 @@ (define_c_enum "unspec"
UNSPEC_TOCREL
UNSPEC_MACHOPIC_OFFSET
UNSPEC_BPERM
-   UNSPEC_COPYSIGN
UNSPEC_PARITY
UNSPEC_CMPB
UNSPEC_FCTIW
@@ -5383,9 +5382,8 @@ (define_expand "copysign3"
 ;; compiler from optimizing -0.0
 (define_insn "copysign3_fcpsgn"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
-   (unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")
- (match_operand:SFDF 2 "gpc_reg_operand" "d,wa")]
-UNSPEC_COPYSIGN))]
+   (copysign:SFDF (match_operand:SFDF 1 "gpc_reg_operand" "d,wa") 
+  (match_operand:SFDF 2 "gpc_reg_operand" "d,wa")))]
   "TARGET_HARD_FLOAT && (TARGET_CMPB || VECTOR_UNIT_VSX_P (mode))"
   "@
fcpsgn %0,%2,%1
@@ -14984,10 +14982,9 @@ (define_expand "copysign3"
 
 (define_insn "copysign3_hard"
   [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
-   (unspec:IEEE128
-[(match_operand:IEEE128 1 "altivec_register_operand" "v")
- (match_operand:IEEE128 2 "altivec_register_operand" "v")]
-UNSPEC_COPYSIGN))]
+   (copysign:IEEE128
+(match_operand:IEEE128 1 "altivec_register_operand" "v")
+(match_operand:IEEE128 2 "altivec_register_operand" "v")))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
"xscpsgnqp %0,%2,%1"
   [(set_attr "type" "vecmove")
@@ -14995,10 +14992,9 @@ (define_insn "copysign3_hard"
 
 (define_insn "copysign3_soft"
   [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
-   (unspec:IEEE128
-[(match_operand:IEEE128 1 "altivec_register_operand" "v")
- (match_operand:IEEE128 2 "altivec_register_operand" "v")]
-UNSPEC_COPYSIGN))
+   (copysign:IEEE128
+(match_operand:IEEE128 1 "altivec_register_operand" "v")
+(match_operand:IEEE128 2 "altivec_register_operand" "v")))
(clobber (match_scratch:IEEE128 3 "=&v"))]
   "!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
"xscpsgndp %x3,%x2,%x1\;xxpermdi %x0,%x3,%x1,1"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 1ae04c8e0a8..f4fc620b653 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -332,8 +332,8 @@ (define_expand "vector_btrunc2"
 
 (define_expand "vector_copysign3"
   [(set (match_operand:VEC_F 0 "vfloat_operand")
-   (unspec:VEC_F [(match_operand:VEC_F 1 "vfloat_operand")
-  (match_operand:VEC_F 2 "vfloat_operand")] 
UNSPEC_COPYSIGN))]
+   (copysign:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
+   (match_operand:VEC_F 2 "vfloat_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
 {
   if (mode == V4SFmode && VECTOR_UNIT_ALTIVEC_P (mode))
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 4de41e78d51..f3b40229094 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2233,10 +2233,9 @@ (define_insn "*vsx_ge__p"
 ;; Copy sign
 (define_insn "vsx_copysign3"
   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
-   (unspec:VSX_F
-[(match_operand:VSX_F 1 "vsx_register_operand" "wa")
- (match_operand:VSX_F 2 "vsx_register_operand" "wa")]
-UNSPEC_COPYSIGN))]
+   (copysign:VSX_F
+(match_operand:VSX_F 1 "vsx_register_operand" "wa")
+(match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
   "VECTOR_UNIT_VSX_P (mode)"
   "xvcpsgnp %x0,%x2,%x1"
   [(set_attr "type" "")])
-- 
2.41.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH] PR target/111778 - Fix undefined shifts in PowerPC compiler

2023-10-12 Thread Michael Meissner

I was building a cross compiler to PowerPC on my x86_86 workstation with the
latest version of GCC on October 11th.  I could not build the compiler on the
x86_64 system as it died in building libgcc.  I looked into it, and I
discovered the compiler was recursing until it ran out of stack space.  If I
build a native compiler with the same sources on a PowerPC system, it builds
fine.

I traced this down to a change made around October 10th:

| commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD)
| Author: Jiufu Guo 
| Date:   Tue Jan 10 20:52:33 2023 +0800
|
|   rs6000: build constant via li/lis;rldicl/rldicr
|
|   If a constant is possible left/right cleaned on a rotated value from
|   a negative value of "li/lis".  Then, using "li/lis ; rldicl/rldicr"
|   to build the constant.

The code was doing a -1 << 64 which is undefined behavior because different
machines produce different results.  On the x86_64 system, (-1 << 64) produces
-1 while on a PowerPC 64-bit system, (-1 << 64) produces 0.  The x86_64 then
recurses until the stack runs out of space.

If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC
crosss compiler and on a native PowerPC system.

Can I check this into the master branch to fix the problem?

2023-10-12  Michael Meissner  

gcc/

PR target/111778
* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect
code from shifts that are undefined.
(can_be_built_by_li_lis_and_rldicr): Likewise.
(can_be_built_by_li_and_rldic): Protect code from shifts that
undefined.  Also replace uses of 1ULL with HOST_WIDE_INT_1U.

---
 gcc/config/rs6000/rs6000.cc | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 2828f01413c..cc24dd5301e 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10370,6 +10370,11 @@ can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, 
int *shift,
   /* Leading zeros may be cleaned by rldicl with a mask.  Change leading zeros
  to ones and then recheck it.  */
   int lz = clz_hwi (c);
+
+  /* If lz == 0, the left shift is undefined.  */
+  if (!lz)
+return false;
+
   HOST_WIDE_INT unmask_c
 = c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
   int n;
@@ -10398,6 +10403,11 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
int *shift,
   /* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing zeros
  to ones and then recheck it.  */
   int tz = ctz_hwi (c);
+
+  /* If tz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
+  if (tz >= HOST_BITS_PER_WIDE_INT)
+return false;
+
   HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
   int n;
   if (can_be_rotated_to_lowbits (~unmask_c, 15, &n)
@@ -10428,8 +10438,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
  right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
   int tz = ctz_hwi (c);
   int lz = clz_hwi (c);
+
+  /* If lz == HOST_BITS_PER_WIDE_INT, the left shift is undefined.  */
+  if (lz >= HOST_BITS_PER_WIDE_INT)
+return false;
+
   int middle_ones = clz_hwi (~(c << lz));
-  if (tz + lz + middle_ones >= ones)
+  if (tz + lz + middle_ones >= ones
+  && (tz - lz) < HOST_BITS_PER_WIDE_INT
+  && tz < HOST_BITS_PER_WIDE_INT)
 {
   *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
   *shift = tz;
@@ -10440,7 +10457,8 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
   int leading_ones = clz_hwi (~c);
   int tailing_ones = ctz_hwi (~c);
   int middle_zeros = ctz_hwi (c >> tailing_ones);
-  if (leading_ones + tailing_ones + middle_zeros >= ones)
+  if (leading_ones + tailing_ones + middle_zeros >= ones
+  && middle_zeros < HOST_BITS_PER_WIDE_INT)
 {
   *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
   *shift = tailing_ones + middle_zeros;
@@ -10450,10 +10468,15 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
   /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
   /* Get the position for the first bit of successive 1.
  The 24th bit would be in successive 0 or 1.  */
-  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
+  HOST_WIDE_INT low_mask = (HOST_WIDE_INT_1U << 24) - HOST_WIDE_INT_1U;
   int pos_first_1 = ((c & (low_mask + 1)) == 0)
  ? clz_hwi (c & low_mask)
  : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
+
+  /* Make sure the left and right shifts are defined.  */
+  if (!IN_RANGE (pos_first_1, 1, HOST_BITS_PER_WIDE_INT-1))
+return false;
+
   middle_ones = clz_hwi (~c << pos_first_

[PATCH] Power10: Add options to disable load and store vector pair.

2023-10-13 Thread Michael Meissner

In working on some future patches that involve utilizing vector pair
instructions, I wanted to be able to tune my program to enable or disable using
the vector pair load or store operations while still keeping the other
operations on the vector pair.

This patch adds two undocumented tuning options.  The -mno-load-vector-pair
option would tell GCC to generate two load vector instructions instead of a
single load vector pair.  The -mno-store-vector-pair option would tell GCC to
generate two store vector instructions instead of a single store vector pair.

If either -mno-load-vector-pair is used, GCC will not generate the indexed
stxvpx instruction.  Similarly if -mno-store-vector-pair is used, GCC will not
generate the indexed lxvpx instruction.  The reason for this is to enable
splitting the {,p}lxvp or {,p}stxvp instructions after reload without needing a
scratch GPR register.

The default for -mcpu=power10 is that both load vector pair and store vector
pair are enabled.

I decided that if the user explicitly used the __builtin_vsx_lxvp or the
__builtin_vsx_stxvp built-in functions to load or store a vector pair, that
those functions would always generate a vector pair instruction.

I added code so that the user code can modify these settings using either a
'#pragma GCC target' directive or used __attribute__((__target__(...))) in the
function declaration.

I added tests for the switches, #pragma, and attribute options.

I have built this on both little endian power10 systems and big endian power9
systems doing the normal bootstrap and test.  There were no regressions in any
of the tests, and the new tests passed.  Can I check this patch into the master
branch?

2023-10-13  Michael Meissner  

gcc/

* config/rs6000/mma.md (movoo): Add support for -mload-vector-pair and
-mstore-vector-pair.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Likewise.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.md (rs6000_setup_reg_addr_masks): If either load
vector pair or store vector pair instructions are not being generated,
don't allow lxvpx or stxvpx to be generated.
(rs6000_option_override_internal): Add warnings if either
-mload-vector-pair or -mstore-vector-pair is used without having MMA
instructions.
(rs6000_opt_masks): Allow user to override -mload-vector-pair or
-mstore-vector-pair via #pragma or attribute.
* config/rs6000/rs6000.opt (-mload-vector-pair): New option.
(-mstore-vector-pair): Likewise.

gcc/testsuite/

* gcc.target/powerpc/vector-pair-attribute.c: New test.
* gcc.target/powerpc/vector-pair-pragma.c: New test.
* gcc.target/powerpc/vector-pair-switch1.c: New test.
* gcc.target/powerpc/vector-pair-switch2.c: New test.
* gcc.target/powerpc/vector-pair-switch3.c: New test.
* gcc.target/powerpc/vector-pair-switch4.c: New test.
---
 gcc/config/rs6000/mma.md  | 44 +++
 gcc/config/rs6000/rs6000-builtin.cc   | 46 +---
 gcc/config/rs6000/rs6000-builtins.def |  6 ++
 gcc/config/rs6000/rs6000-cpus.def |  8 ++-
 gcc/config/rs6000/rs6000.cc   | 30 +-
 gcc/config/rs6000/rs6000.opt  |  8 +++
 .../powerpc/vector-pair-attribute.c   | 39 +
 .../gcc.target/powerpc/vector-pair-builtin.c  | 40 ++
 .../gcc.target/powerpc/vector-pair-pragma.c   | 55 +++
 .../gcc.target/powerpc/vector-pair-switch1.c  | 16 ++
 .../gcc.target/powerpc/vector-pair-switch2.c  | 17 ++
 .../gcc.target/powerpc/vector-pair-switch3.c  | 17 ++
 .../gcc.target/powerpc/vector-pair-switch4.c  | 17 ++
 13 files changed, 331 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 575751d477e..fc7e95bc167 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -91,6 +91,7 @@ (define_c_enum "unspec"
UNSPEC_MMA_XVI8GER4SPP
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC
+   UNSPEC_MMA_VECTOR_PAIR_MEMORY
   ])
 
 (define_c_enum "unspecv"
@@ -298,6 +299,49 @@ (define_insn_and_split "*movoo"
   "TARGET_MMA
&& (gpc_reg_operand (operands[0], OOmode)
|| gpc_reg_operand (operands[1], OOmode))"
+{
+  if (MEM_P (operands[0]))
+return TARGET_STORE_VECTOR_PAIR ? &qu

[PATCH 0/6] PowerPC Future patches

2023-10-18 Thread Michael Meissner

This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 6 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators that are located within in DMRs
instead of the FPRs.  This patch enables the register allocation, but it does
not move the existing MMA to use these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.

In terms of changes, these patch now use the wD constraint for accumulators.
If you compile with -mcpu=power10, the wD constraint will match the equivalent
FPR register that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

These patches also modifies the print_operand %A output modifier to print out
DMR register numbers if -mcpu=future, and continue to print out the FPR
register number divided by 4 for -mcpu=power10.

In general, if you only use the built-in functions, things work between the two
systems.  If you use extended asm, you will likely need to modify the code.
Going forward, hopefully if you modify your code to use the wD constraint and
%A output modifier, you can write code that switches more easily between the
two systems.

Again, these are preliminary patches for a potential future machine.  Things
will likely change in terms of implementation and usage over time.

Originally these patches were submitted in November 2022:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH 1/6] PowerPC: Add -mcpu=future option

2023-10-18 Thread Michael Meissner

This patch implements support for a potential future PowerPC cpu.  Features
added with -mcpu=future, may or may not be added to new PowerPC processors.

This patch adds support for the -mcpu=future option.  If you use -mcpu=future,
the macro __ARCH_PWR_FUTURE__ is defined, and the assembler .machine directive
"future" is used.  Future patches in this series will add support for new
instructions that may be present in future PowerPC processors.

This particular patch does not any new features.  It exists as a ground work
for future patches to support for a possible PowerPC processor in the future.

This patch does not implement any differences in tuning when -mcpu=future is
used compared to -mcpu=power10.  If -mcpu=future is used, GCC will use power10
tuning.  If you explicitly use -mtune=future, you will get a warning that
-mtune=future is not supported, and default tuning will be set for power10.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__ARCH_PWR_FUTURE__ if -mcpu=future.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
(POWERPC_MASKS): Add -mcpu=future support.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs600_cpu_index_lookup): New helper
function.
(rs6000_option_override_internal): Make -mcpu=future set
-mtune=power10.  If the user explicitly uses -mtune=future, give a
warning and reset the tuning to power10.
(rs6000_option_override_internal): Use power10 costs for future
machine.
(rs6000_machine_from_flags): Add support for -mcpu=future.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Likewise.
* config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document 
-mcpu=future.
---
 gcc/config/rs6000/rs6000-c.cc   |  2 +
 gcc/config/rs6000/rs6000-cpus.def   |  6 +++
 gcc/config/rs6000/rs6000-opts.h |  4 +-
 gcc/config/rs6000/rs6000-tables.opt |  3 ++
 gcc/config/rs6000/rs6000.cc | 58 -
 gcc/config/rs6000/rs6000.h  |  1 +
 gcc/config/rs6000/rs6000.md |  2 +-
 gcc/config/rs6000/rs6000.opt|  4 ++
 gcc/doc/invoke.texi |  2 +-
 9 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 65be0ac43e2..e276c20cccd 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_POWER10) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
+  if ((flags & OPTION_MASK_FUTURE) != 0)
+rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
 rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 8c530a22da8..a6d9d7bf9a8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -88,6 +88,10 @@
 | OPTION_MASK_POWER10  \
 | OTHER_POWER10_MASKS)
 
+/* Flags for a potential future processor that may or may not be delivered.  */
+#define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_FUTURE)
+
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
 | OPTION_MASK_P9_MINMAX)
@@ -134,6 +138,7 @@
 | OPTION_MASK_FPRND\
 | OPTION_MASK_POWER10  \
 | OPTION_MASK_P10_FUSION   \
+| OPTION_MASK_FUTURE   \
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_LOAD_VECTOR_PAIR \
@@ -267,3 +272,4 @@ RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, 
OPTION_MASK_PPC_GFXOPT
 RS6000_CPU ("powerpc64le", PROCESSOR_POWER8, MASK_POWERPC64
| ISA_2_7_MASKS_SERVER | OPTION_M

[PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2023-10-18 Thread Michael Meissner

This patch re-enables generating load and store vector pair instructions when
doing certain memory copy operations when -mcpu=future is used.

During power10 development, it was determined that using store vector pair
instructions were problematical in a few cases, so we disabled generating load
and store vector pair instructions for memory options by default.  This patch
re-enables generating these instructions if -mcpu=future is used.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
-mblock-ops-vector-pair.
(POWERPC_MASKS): Likewise.
---
 gcc/config/rs6000/rs6000-cpus.def | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index a6d9d7bf9a8..849af6b3ac8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -90,6 +90,7 @@
 
 /* Flags for a potential future processor that may or may not be delivered.  */
 #define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_FUTURE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
@@ -127,6 +128,7 @@
 
 /* Mask of all options to set the default isa flags based on -mcpu=.  */
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
-- 
2.41.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2023-10-18 Thread Michael Meissner

The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped with
the traditional floating point registers 0..31, but logically the accumulator
registers were separate from the FPR registers.  In ISA 3.1, it was anticipated
that in future systems, the accumulator registers may no overlap with the FPR
registers.  This patch adds the support for dense math registers as separate
registers.

This particular patch does not change the MMA support to use the accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will allow access to
accumulators that overlap with the VSX vector registers 0..31.  If both MMA and
dense math are selected (i.e. -mcpu=future), the wD constraint will only allow
dense math registers.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX register
number to the accumulator number, by dividing it by 4.  If both MMA and dense
math are selected, then %A will map the separate DMR registers into 0..7.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

1)  If possible, don't use extended asm, but instead use the MMA built-in
functions;

2)  If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

3)  Only use the built-in zero, assemble and disassemble functions create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
change in the future.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/constraints.md (wD constraint): New constraint.
* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
(movxo): Convert into define_expand.
(movxo_vsx): Version of movxo where accumulators overlap with VSX vector
registers 0..31.
(movxo_dm): Verson of movxo that supports separate dense math
accumulators.
(mma_assemble_acc): Add dense math support to define_expand.
(mma_assemble_acc_vsx): Rename from mma_assemble_acc, and restrict it to
non dense math systems.
(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
(mma_disassemble_acc): Add dense math support to define_expand.
(mma_disassemble_acc_vsx): Rename from mma_disassemble_acc, and restrict
it to non dense math systems.
(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Likewise.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_option_override_internal): Add checking for -mdense-math.
(rs6000_secondary_reload_memory): Add support for DMR registers.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(print_operand): Make %A handle both FPRs and DMRs.
(rs6000_dmr_register_move_cost): New helper function.
(rs6000_register_move_cost): Add support for DMR registers.
(rs6000_memory_move_cost): Likewise.
(rs6000_compute_pressure_classes): Likewise.
(rs6000

[PATCH 4/6] PowerPC: Make MMA insns support DMR registers.

2023-10-18 Thread Michael Meissner

This patch changes the MMA instructions to use either FPR registers
(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
instruction names are used.

A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/mma.md (mma_): New define_expand to handle
mma_ for dense math and non dense math.
(mma_ insn): Restrict to non dense math.
(mma_xxsetaccz): Convert to define_expand to handle non dense math and
dense math.
(mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non
dense math.
(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
(mma_): Add support for dense math.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__PPC_DMR__ if we have dense math instructions.
* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
dense math and only FPRs if not dense math.
(rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
prime the DMR registers or the xxmfacc instruction to de-prime
instructions if we have dense math register support.
---
 gcc/config/rs6000/mma.md  | 247 +-
 gcc/config/rs6000/rs6000-c.cc |   3 +
 gcc/config/rs6000/rs6000.cc   |  35 ++---
 3 files changed, 176 insertions(+), 109 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index d2c5b73fa8f..e5589d8eccc 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -596,190 +596,249 @@ (define_insn "*mma_disassemble_acc_dm"
   "dmxxextfdmr256 %0,%1,2"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_"
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator.  We enforce this by marking the output as early clobber.  If we
+;; have dense math, we don't need the whole prime/de-prime action, so just make
+;; thse instructions be NOPs.
+
+(define_expand "mma_"
+  [(set (match_operand:XO 0 "register_operand")
+   (unspec:XO [(match_operand:XO 1 "register_operand")]
+  MMA_ACC))]
+  "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+
+  /* Generate the prime/de-prime code.  */
+})
+
+(define_insn "*mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
MMA_ACC))]
-  "TARGET_MMA"
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   " %A0"
   [(set_attr "type" "mma")])
 
 ;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
+;; to disable optimization and we can do a normal UNSPEC.
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:XO 0 "register_operand")
(unspec_volatile:XO [(const_int 0)]
UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
+  DONE;
+}
+})
+
+(define_insn "*mma_xxsetaccz_vsx"
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+   (unspec_volatile:XO [(const_int 0)]
+   UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   "xxsetaccz %A0"
   [(set_attr "type" "mma")])
 
+
+(define_insn "mma_xxsetaccz_dm"
+  [(set (match_operand:XO 0 "dmr_operand" "=wD")
+   (unspec:XO [(const_int 0)]
+  UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_DENSE_MATH"
+  "dmsetdmrz %0"
+  [(set_attr "type" "mma")])
+
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-

[PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2023-10-18 Thread Michael Meissner

This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
(avvi4i4i8_dm): Likewise.
(vvi4i4i2_dm): Likewise.
(avvi4i4i2_dm): Likewise.
(vvi4i4_dm): Likewise.
(avvi4i4_dm): Likewise.
(pvi4i2_dm): Likewise.
(apvi4i2_dm): Likewise.
(vvi4i4i4_dm): Likewise.
(avvi4i4i4_dm): Likewise.
(mma_): Add support for running on DMF systems, generating the dense
math instruction and using the dense math accumulators.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.
---
 gcc/config/rs6000/mma.md  |  98 +++--
 .../gcc.target/powerpc/dm-double-test.c   | 194 ++
 gcc/testsuite/lib/target-supports.exp |  19 ++
 3 files changed, 299 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index e5589d8eccc..cae407bc37c 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -228,13 +228,22 @@ (define_int_attr apv  [(UNSPEC_MMA_XVF64GERPP 
"xvf64gerpp")
 
 (define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
 
+(define_int_attr vvi4i4i8_dm   [(UNSPEC_MMA_PMXVI4GER8 
"pmdmxvi4ger8")])
+
 (define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   
"pmxvi4ger8pp")])
 
+(define_int_attr avvi4i4i8_dm  [(UNSPEC_MMA_PMXVI4GER8PP   
"pmdmxvi4ger8pp")])
+
 (define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
 (UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
 (UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
 (UNSPEC_MMA_PMXVBF16GER2   
"pmxvbf16ger2")])
 
+(define_int_attr vvi4i4i2_dm   [(UNSPEC_MMA_PMXVI16GER2"pmdmxvi16ger2")
+(UNSPEC_MMA_PMXVI16GER2S   
"pmdmxvi16ger2s")
+(UNSPEC_MMA_PMXVF16GER2"pmdmxvf16ger2")
+(UNSPEC_MMA_PMXVBF16GER2   
"pmdmxvbf16ger2")])
+
 (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVI16GER2SPP 
"pmxvi16ger2spp")
 (UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
@@ -246,25 +255,54 @@ (define_int_attr avvi4i4i2
[(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVBF16GER2NP 
"pmxvbf16ger2np")
 (UNSPEC_MMA_PMXVBF16GER2NN 
"pmxvbf16ger2nn")])
 
+(define_int_attr avvi4i4i2_dm  [(UNSPEC_MMA_PMXVI16GER2PP  
"pmdmxvi16ger2pp")
+(UNSPEC_MMA_PMXVI16GER2SPP 
"pmdmxvi16ger2spp")
+(UNSPEC_MMA_PMXVF16GER2PP  
"pmdmxvf16ger2pp")
+(UNSPEC_MMA_PMXVF16GER2PN  
"pmdmxvf16ger2pn")
+(UNSPEC_MMA_PMXVF16GER2NP  
"pmdmxvf16ger2np")
+(UNSPEC_MMA_PMXVF16GER2NN  
"pmdmxvf16ger2nn")
+(UNSPEC_MMA_PMXVBF16GER2PP 
"pmdmxvbf16ger2pp")
+(UNSPEC_MMA_PMXVBF16GER2PN 
"pmdmxvbf16ger2pn")
+(UNSPEC_MMA_PMXVBF16GER2NP 
"pmdmxvbf16ger2np")
+(UNSPEC_MMA_PMXVBF16GER2NN 
"pmdmxvbf16ger2nn")])
+
 (define_int_attr vvi4i4[(UNSPEC_MMA_PMXVF32GER 
"pmxvf32ger")])
 
+(define_int_attr vvi4i4_dm [(UNSPEC_MMA_PMXVF32GER 
"pmdmxvf32ger")])
+
 (define_int_attr avvi4i4   [(UNSPEC_MMA_PMXVF32GERPP   "pmxvf32gerpp")
 (UNSPEC_MMA_PMXVF32GERPN   "pmxvf32gerpn"

[PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2023-10-18 Thread Michael Meissner

This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX registers,
since there are no load/store dense math instructions.  I added the new keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.
---
 gcc/config/rs6000/mma.md  | 152 ++
 gcc/config/rs6000/rs6000-builtin.cc   |  13 ++
 gcc/config/rs6000/rs6000-call.cc  |  13 +-
 gcc/config/rs6000/rs6000-modes.def|   4 +
 gcc/config/rs6000/rs6000.cc   | 135 
 gcc/config/rs6000/rs6000.h|   7 +-
 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  63 
 7 files changed, 351 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index cae407bc37c..0a89db8af99 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -93,6 +93,11 @@ (define_c_enum "unspec"
UNSPEC_MMA_XXMTACC
UNSPEC_MMA_VECTOR_PAIR_MEMORY
UNSPEC_DM_ASSEMBLE_ACC
+   UNSPEC_DM_INSERT512_UPPER
+   UNSPEC_DM_INSERT512_LOWER
+   UNSPEC_DM_EXTRACT512
+   UNSPEC_DMR_RELOAD_FROM_MEMORY
+   UNSPEC_DMR_RELOAD_TO_MEMORY
   ])
 
 (define_c_enum "unspecv"
@@ -916,3 +921,150 @@ (define_insn "mma_"
   [(set_attr "type" "mma")
(set_attr "prefixed" "yes")
(set_attr "isa" "dm,not_dm,not_dm")])
+
+
+;; TDOmode (i.e. __dmr).
+(define_expand "movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand")
+   (match_operand:TDO 1 "input_operand"))]
+  "TARGET_DENSE_MATH"
+{
+  rs6000_emit_move (operands[0], operands[1], TDOmode);
+  DONE;
+})
+
+(define_insn_and_split "*movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa")
+   (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))]
+  "TARGET_DENSE_MATH
+   && (gpc_reg_operand (operands[0], TDOmode)
+   || gpc_reg_operand (operands[1]

[PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee

2018-07-05 Thread Michael Meissner

This patch adds a simple check of whether the GLIBC should be capable of
switching the long double format on the PowerPC to IEEE 128-bit floating point.
At the moment, library work is not yet finished, but I'm assuming that the
patches will be in place when GLIBC 2.28 is released.  If it turns out that the
finished support does not make it until 2.29, we can adjust the patch later.

Right now, if you use standard GLIBC 2.27 or earlier (ignoring the bits that
actually use long double that will need to be handled), you will not be able to
build libstdc++-v3 when long double is configured to be IEEE 128-bit due to
errors with overloaded functions like issignalling (where both __float128 and
long double versions are defined).  The GLIBC team has a fix for this, and it
should appear in 2.28.

This patch checks whether the GLIBC version is 2.28 before allowing you to
switch the long double type.  Because the work to prepare GLIBC for the switch
is being done using an Advance Toolchain framework, the patch allows an Advance
Toolchain 2.27 with the --with-advance-toolchain configuration option (the
official AT 11 release uses GLIBC 2.26 as a framework, and when completed the
AT 12 release should use GLIBC 2.28).

I have checked it on a little endian power8 system, building both toolchains
using IBM long double and IEEE long double configurations.  The tests that
depend on the library support for long double that failed before still fail.

I also did IEEE long double builds using the host GLIBC and that AT 11, and
verified that once GCC is configured it generates an error.  I built bootstrap
compilers on a big endian system, and verified if I selected IEEE long double,
it would fail, since I currently don't have a big endian GLIBC with the fixes
installed.

Can I check this in the trunk at on the GCC 8 branch?

2018-07-05  Michael Meissner  

* configure.ac (powerpc64*-*-linux*): Combine big and little
endian checks for the long double format.  Add checks to make sure
the GLIBC can handle configuration of long double to be IEEE
128-bit before building GCC.
* configure: Regenerate.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 262443)
+++ gcc/configure.ac(working copy)
@@ -6031,23 +6031,48 @@ AC_ARG_WITH([long-double-format],
   [AS_HELP_STRING([--with-long-double-format={ieee,ibm}]
  [Specify whether PowerPC long double uses IEEE or IBM 
format])],[
 case "$target:$with_long_double_format" in
-  powerpc64le-*-linux*:ieee | powerpc64le-*-linux*:ibm)
-:
-;;
-  powerpc64-*-linux*:ieee | powerpc64-*-linux*:ibm)
-# IEEE 128-bit emulation is only built on 64-bit VSX Linux systems
-case "$with_cpu" in
-  power7 | power8 | power9 | power1*)
+  powerpc64le-*-linux*:ibm | powerpc64-*-linux*:ibm | \
+  powerpc64le-*-linux*:ieee | powerpc64-*-linux*:ieee)
+# IEEE 128-bit emulation is only built on 64-bit VSX Linux systems.
+# Little endian 64-bit systems are always VSX, but big endian systems
+# might default to power4.
+case "$target:$with_cpu" in
+  powerpc64le-* | *:power7 | *:power8 | *:power9 | *:power1*)
:
;;
   *)
AC_MSG_ERROR([Configuration option --with-long-double-format is only \
 supported if the default cpu is power7 or newer])
with_long_double_format=""
-   ;;
-  esac
-  ;;
-  xpowerpc64*-*-linux*:*)
+esac
+
+if test "x$with_long_double_format" = xieee; then
+  # See if we have a new enough GLIBC to allow using IEEE 128-bit long
+  # double.  We assume the public 2.28 GLIBC and the development version of
+  # the Advance Toolchain (2.27) have all of the missing bits.
+  ieee_minor="28"
+  glibc_ieee="no"
+  atoolchain=""
+  if test "x$with_advance_toolchain" != x \
+-a -d "/opt/$with_advance_toolchain/." \
+-a -d "/opt/$with_advance_toolchain/bin/." \
+-a -d "/opt/$with_advance_toolchain/include/."; then
+
+   ieee_minor="27"
+   atoolchain="Advance Toolchain "
+  fi
+  GCC_GLIBC_VERSION_GTE_IFELSE([2], [$ieee_minor], [glibc_ieee=yes], )
+  if test "x$glibc_ieee" = xyes; then
+   echo "${atoolchain}GLIBC appears to have IEEE long double support" 1>&2
+
+  else
+   AC_MSG_ERROR([Configuration option --with-long-double-format=ieee \
+needs ${atoolchain}GLIBC 2.${ieee_minor} or newer])
+   with_long_double_format=""
+  fi
+fi
+;;
+  powerpc64*-*-linux*:*)
 AC_MSG_ERROR([--with-long-double-format argument should be ibm or ieee])
 with_long_double

Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee

2018-07-06 Thread Michael Meissner

On Fri, Jul 06, 2018 at 06:38:55AM -0500, Segher Boessenkool wrote:
> On Fri, Jul 06, 2018 at 01:51:37AM -0400, Michael Meissner wrote:
> >  case "$target:$with_long_double_format" in
> 
> > -  xpowerpc64*-*-linux*:*)
> 
> So this case could never happen.  The changelog should mention it fixes
> that bug (and having it as a separate patch is much preferred!)

I assume what happened is I accidently added the 'x' to the working copy after
submitting the patch, but before committing it and I didn't notice it.  Since
it is in configuration support, it isn't part of the test sutie, and it wasn't
noticed.

I can add a line to the ChangeLog if desired.

> Other than this thing, the original code was easier to read.  What does
> this part of the patch improve?

You complained that you were getting errors when using the system glibc (based
on 2.27 on an Ubuntu system) and using --with-long-double-format=ieee (where it
would die in the middle of building libstdc++-v3).

I wrote the patch to check that the glibc has the support so it fails when
configuring the compiler and gives a sensible message (need glibc 2.28).  But
it doesn't really change anything.  If you have an appropriate glibc, it will
build without the patch, and if you don't, it will fail.  But it should be
friendlier to people building the compiler to understand why it failed.

I could duplicate the tests for glibc 2.28 (and AT-next alpha) for big endian
and little endian if desired, but it seemed clearer to me to combine the code
rather than duplicate the tests.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee

2018-07-06 Thread Michael Meissner

On Fri, Jul 06, 2018 at 10:16:34AM -0300, Tulio Magno Quites Machado Filho 
wrote:
> I suggest to test with the following program:
> 
> #include 
> 
> int
> main ()
> {
>   return !isinfl(__builtin_infl());
> }
> 
> Build it with:
> gcc -mabi=ieeelongdouble -fno-builtin -Wno-psabi -lm test-ldbl.c
> 
> If the execution of the program returns 0, your math library supports IEEE 
> long
> double.

Thanks, but I suspect that it won't work for building cross compilers or for
building where the compiler built uses the Advance Toolchain libraries and
shared library loader instead of the system versions using the configuration
option --with-advance-toolchain=atx.y.

The issue is you need to test whether the target GLIBC has the support when
configuring the compiler, but if you are building for a cross target, you can't
run the resulting binary.  Even on a native system, with options like
--with-advance-toolchain and --with-sysroot, the libraries used by the host
compiler used to build stage1 of GCC might be different from the libraries used
to build the target compiler (or stage2/stage3 in a bootstrap native build).

So I used the GLIBC version tests that were already part of the GCC
configuration.

If there is a simple method that works for cross compilers or where a specified
sysroot is used, it would be simpler than having version checks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH], Remove undocumented -mtoc-fusion from PowerPC

2018-07-13 Thread Michael Meissner

Back in the days when I was developing the extended fusion support for PowerPC
(-mpower9-fusion), I added a partially implemented option called toc fusion.
The idea was to recognize TOC entries (that normally get split into HIGH/LO_SUM
pairs) early on, and keep the pairs together.  Unfortunately, I messed the
setting, and you could not actually use -mtoc-fusion without also setting
-mcmodel=medium, since the TOC fusion tests in rs6000.c occured before the
default code model was set in SUBSUBTARGET_OPTIONS.  However, I stopped doing
fusion work to do other things (basic power9 enablement and IEEE 128-bit
floating point).

While it would be simple to move the tests for TOC fusion to after the location
where the code model is set, I'm thinking that the current code is rather
limited.  Right now, toc fusion replaces each TOC reference with a new insn
that has the scratch register as a clobber.  However, if you have multiple
references to the same variable (such as doing the ++/-- operators) in a basic
block or referencs to variables whose location near to the variable you
previously referenced, we will generate multiple ADDIS operations.

I have ideas how to a better job of fusion for current and future machines
using a machine dependent pass to do fusion optimizations within a basic block.
This means rather than keeping the toc fusion around (that nobody used), I
would prefer to delete the current code, and replace it with better code as I
implement it.

I have tested this on a power8 little endian system with a bootstrap build and
with make check.  There were no regressions.  In addition, I built the full
spec 2006 CPU benchmark suite for power9 to make sure I didn't accidently
delete insns that are used for -mpower9-fusion.  Can I check this into the
trunk?  I don't anticipate that we will need a backport to the FSF GCC 8
branch.

2018-07-13  Michael Meissner  

* config/rs6000/constraints.md (wG constraint): Delete, no longer
used.
* config/rs6000/predicates.md (p9_fusion_reg_operand): Rename
predicate to reflect toc fusion has been deleted.
(toc_fusion_mem_raw): Delete, no longer used.
(toc_fusion_mem_wrapped): Likewise.
* config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Delete toc
fusion mask bit.
* config/rs6000/rs6000-protos.h (fusion_wrap_memory_address):
Delete, no longer used.
* config/rs6000/rs6000.c (struct rs6000_reg_addr): Delete fields
meant to be used for toc fusion.
(rs6000_debug_print_mode): Delete toc fusion debugging.
(rs6000_debug_reg_global): Likewise.
(rs6000_init_hard_regno_mode_ok): Delete setting up fields for toc
fusion and secondary reload support that were never used.
(rs6000_option_override_internal): Delete TOC fusion, that was only
partially defined, and it did not work unless you also used the
-mcmodel= switch.
(rs6000_legitimate_address_p): Delete TOC fusion support.
(rs6000_opt_masks): Likewise.
(fusion_wrap_memory_address): Delete function, no longer used.
(fusion_split_address); Delete TOC fusion support.
* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): Delete, no
longer used with toc fusion being deleted.
(TARGET_TOC_FUSION_FP): Likewise.
* config/rs6000/rs6000.md (UNSPEC_FUSION_ADDIS): Delete TOC fusion
UNSPEC.
(toc fusion spliter): Delete TOC fusion support.
(toc_fusionload_): Likewise.
(toc_fusionload_di): Likewise.
(fusion_gpr_load_): Delete generator function, this insn no
longer needs to be named.  Rename predicate to delete TOC fusion.
(fusion_gpr___load): Likewise.
(fusion_gpr___store): Likewise.
(fusion_vsx___load): Likewise.
(fusion_vsx___store): Likewise.
(p9 fusion peephole2s): Rename predicate to delete TOC fusion.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 262647)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -157,10 +157,8 @@ (define_memory_constraint "wF"
   "Memory operand suitable for power9 fusion load/stores"
   (match_operand 0 "fusion_addis_mem_combo_load"))
 
-;; Fusion gpr load.
-(define_memory_constraint "wG"
-  "Memory operand suitable for TOC fusion memory references"
-  (match_operand 0 "toc_fusion_mem_wrapped"))
+;; wG is now available.  Previously it was a memory operand suitable for TOC
+;; fusion.
 
 (define_register_constraint "wH" "rs6000_constraints[RS6000_CONSTRAINT_wH]"
   "Altivec register to hold 32-bit integers or NO_REGS.")
Index: gcc/config/rs6000/predicates.m

Re: [PATCH], Remove undocumented -mtoc-fusion from PowerPC

2018-07-27 Thread Michael Meissner

On Wed, Jul 18, 2018 at 05:59:50PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Fri, Jul 13, 2018 at 04:56:13PM -0400, Michael Meissner wrote:
> > This means rather than keeping the toc fusion around (that nobody used), I
> > would prefer to delete the current code, and replace it with better code as 
> > I
> > implement it.
> 
> 
> > +++ gcc/config/rs6000/constraints.md(working copy)
> 
> > +;; wG is now available.  Previously it was a memory operand suitable for 
> > TOC
> > +;; fusion.
> 
> There are many other constraints unused.  Keep track of all, instead?
> Like we have (at the top of this file)
> ;; Available constraint letters: e k q t u A B C D S T
> you could do something similar for the "w" names.

I just deleted the comment, and reworded the other comment.  Here is the
changes I committed:

2018-07-27  Michael Meissner  

* config/rs6000/constraints.md (wG constraint): Delete, no longer
used.
* config/rs6000/predicates.md (p9_fusion_reg_operand): Rename
predicate to reflect toc fusion has been deleted.
(toc_fusion_mem_raw): Delete, no longer used.
(toc_fusion_mem_wrapped): Likewise.
* config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Delete toc
fusion mask bit.
* config/rs6000/rs6000-protos.h (fusion_wrap_memory_address):
Delete, no longer used.
* config/rs6000/rs6000.c (struct rs6000_reg_addr): Delete fields
meant to be used for toc fusion.
(rs6000_debug_print_mode): Delete toc fusion debugging.
(rs6000_debug_reg_global): Likewise.
(rs6000_init_hard_regno_mode_ok): Delete setting up fields for toc
fusion and secondary reload support that were never used.
(rs6000_option_override_internal): Delete TOC fusion, that was only
partially defined, and it did not work unless you also used the
-mcmodel= switch.
(rs6000_legitimate_address_p): Delete TOC fusion support.
(rs6000_opt_masks): Likewise.
(fusion_wrap_memory_address): Delete function, no longer used.
(fusion_split_address); Delete TOC fusion support.
* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): Delete, no
longer used with toc fusion being deleted.
(TARGET_TOC_FUSION_FP): Likewise.
* config/rs6000/rs6000.md (UNSPEC_FUSION_ADDIS): Delete TOC fusion
UNSPEC.
(toc fusion spliter): Delete TOC fusion support.
(toc_fusionload_): Likewise.
(toc_fusionload_di): Likewise.
(fusion_gpr_load_): Delete generator function, this insn no
longer needs to be named.  Rename predicate to delete TOC fusion.
(fusion_gpr___load): Likewise.
(fusion_gpr___store): Likewise.
(fusion_vsx___load): Likewise.
(fusion_vsx___store): Likewise.
    (p9 fusion peephole2s): Rename predicate to delete TOC fusion.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 263034)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -157,11 +157,6 @@ (define_memory_constraint "wF"
   "Memory operand suitable for power9 fusion load/stores"
   (match_operand 0 "fusion_addis_mem_combo_load"))
 
-;; Fusion gpr load.
-(define_memory_constraint "wG"
-  "Memory operand suitable for TOC fusion memory references"
-  (match_operand 0 "toc_fusion_mem_wrapped"))
-
 (define_register_constraint "wH" "rs6000_constraints[RS6000_CONSTRAINT_wH]"
   "Altivec register to hold 32-bit integers or NO_REGS.")
 
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 263034)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -406,13 +406,11 @@ (define_predicate "fpr_reg_operand"
   return FP_REGNO_P (r);
 })
 
-;; Return true if this is a register that can has D-form addressing (GPR and
-;; traditional FPR registers for scalars).  ISA 3.0 (power9) adds D-form
-;; addressing for scalars in Altivec registers.
-;;
-;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
-;; power9 fusion allow the floating point types.
-(define_predicate "toc_fusion_or_p9_reg_operand"
+;; Return true if this is a register that can has D-form addressing (GPR,
+;; traditional FPR registers, and Altivec registers for scalars).  Unlike
+;; power8 fusion, this fusion does not depend on putting the ADDIS instruction
+;; into the GPR register being loaded.
+(define_predicate "p9_fusion_reg_operand"
   (match_code "reg,subreg")
 {
   HOS

[PATCH], Improve PowerPC switch behavior on medium code model system

2018-07-31 Thread Michael Meissner

I noticed that the switch code on PowerPC little endian systems (with medium
code mode) did not follow the ABI in terms of page 69:

Table 2.36. Position-Independent Switch Code for Small/Medium Models
(preferred, with TOC-relative addressing)

The code we currently generate is:

.section".toc","aw"
.align 3
.LC0:
.quad   .L4
.section".text"

# ...

addis 10,2,.LC0@toc@ha
ld 10,.LC0@toc@l(10)
sldi 3,3,2
add 9,10,3
lwa 9,0(9)
add 9,9,10
mtctr 9
bctr
.L4:
.long .L2-.L4
.long .L12-.L4
.long .L11-.L4
.long .L10-.L4
.long .L9-.L4
.long .L8-.L4
.long .L7-.L4
.long .L6-.L4
.long .L5-.L4
.long .L3-.L4

While the suggested code would be something like:

addis 10,2,.L4@toc@ha
addi 10,10,.L4@toc@l
sldi 3,3,2
lwax 9,10,3
add 9,9,10
mtctr 9
bctr
.p2align 2
.align 2
.L4:
.long .L2-.L4
.long .L12-.L4
.long .L11-.L4
.long .L10-.L4
.long .L9-.L4
.long .L8-.L4
.long .L7-.L4
.long .L6-.L4
.long .L5-.L4
.long .L3-.L4

This patch adds an insn to load a LABEL_REF into a GPR.  This is needed so the
FWPROP1 pass can convert the load the of the label address from the TOC to a
direct load to a GPR.

While working on the patch, I discovered that the LWA instruction did not
support indexed loads.  This was due to it using the 'Y' constraint, which
accepts DS-form offsettable addresses, but not X-form indexed addresses.  I
added the Z constraint so that the indexed form is accepted.

I am in the middle of doing spec 2006 runs on both power8 and power9 systems
with this change.  So far after 2 runs out 3, I'm seeing several minor wins on
power9 (1-2%, perlbench, gcc, sjeng, sphinx3) and no regressions.  On power8 I
see 3 minor wins (1-3%, perlbench, sjeng, omnetpp) and 1 minor regression (1%,
povray).

I have done bootstrap builds with/without the change and there were no
regressions in the test suite.  Can I check this change into the trunk?  It is
a simple enough change for back ports, if desired.

Note, I will be on vacation for 11 days starting this Saturday.  I will not be
actively checking my mail in that time period.  If I get the approval early
enough, I can check it in.  Otherwise, somebody else can check it in if they
monitor for failure, or we can wait until I get around August 14th to check it
in.

2018-07-31  Michael Meissner  

* config/rs6000/predicates.md (label_ref_operand): New predicate
to recognize LABEL_REF.
* config/rs6000/rs6000.c (rs6000_output_addr_const_extra): Allow
LABEL_REF's inside of UNSPEC_TOCREL's.
* config/rs6000/rs6000.md (extendsi2): Allow reg+reg indexed
addressing.
(labelref): New insn to optimize loading a label address into
    registers on a medium code system.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 263040)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1662,6 +1662,10 @@ (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
+;; Match a LABEL_REF operand
+(define_predicate "label_ref_operand"
+  (match_code "label_ref"))
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 263040)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20807,7 +20807,8 @@ rs6000_output_addr_const_extra (FILE *fi
 switch (XINT (x, 1))
   {
   case UNSPEC_TOCREL:
-   gcc_checking_assert (GET_CODE (XVECEXP (x, 0, 0)) == SYMBOL_REF
+   gcc_checking_assert ((GET_CODE (XVECEXP (x, 0, 0)) == SYMBOL_REF
+ || GET_CODE (XVECEXP (x, 0, 0)) == LABEL_REF)
 && REG_P (XVECEXP (x, 0, 1))
 && REGNO (XVECEXP (x, 0, 1)) == TOC_REGISTER);
output_addr_const (file, XVECEXP (x, 0, 0));
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 263040)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -998,7 +998,7 @@ (define_insn "extendsi2"
 "=r, r,   wl,wu,wj,wK, wH,wr&quo

Ping: [PATCH] Power10: Add options to disable load and store vector pair.

2023-10-25 Thread Michael Meissner

Ping patch:

| Date: Fri, 13 Oct 2023 19:41:13 -0400
| From: Michael Meissner 
| Subject: [PATCH] Power10: Add options to disable load and store vector pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632987.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping: [PATCH 1/6] PowerPC: Add -mcpu=future option

2023-10-25 Thread Michael Meissner

Ping patch.

| Date: Wed, 18 Oct 2023 19:58:56 -0400
| From: Michael Meissner 
| Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2023-10-25 Thread Michael Meissner

Ping patch.

| Date: Wed, 18 Oct 2023 20:00:18 -0400
| From: Michael Meissner 
| Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable 
-mblock-ops-vector-pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633512.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2023-10-25 Thread Michael Meissner

Ping patch:

| ate: Wed, 18 Oct 2023 20:01:54 -0400
| From: Michael Meissner 
| Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633513.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.

2023-10-25 Thread Michael Meissner

Ping patch.

| Date: Wed, 18 Oct 2023 20:03:02 -0400
| From: Michael Meissner 
| Subject: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633514.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2023-10-25 Thread Michael Meissner

Ping patch.

| Date: Wed, 18 Oct 2023 20:04:44 -0400
| From: Michael Meissner 
| Subject: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA 
operations.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633515.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2023-10-25 Thread Michael Meissner

Ping patch.

| Date: Wed, 18 Oct 2023 20:06:20 -0400
| From: Michael Meissner 
| Subject: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633516.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH] Power10: Add options to disable load and store vector pair.

2023-11-03 Thread Michael Meissner

Ping #2

| Date: Fri, 13 Oct 2023 19:41:13 -0400
| From: Michael Meissner 
| Subject: [PATCH] Power10: Add options to disable load and store vector pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632987.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH 1/6] PowerPC: Add -mcpu=future option

2023-11-03 Thread Michael Meissner

Ping #2

| Date: Wed, 18 Oct 2023 19:58:56 -0400
| From: Michael Meissner 
| Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2023-11-03 Thread Michael Meissner

Ping #2

| Date: Wed, 18 Oct 2023 20:00:18 -0400
| From: Michael Meissner 
| Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable 
-mblock-ops-vector-pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633512.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2023-11-03 Thread Michael Meissner

Ping #2

| Date: Wed, 18 Oct 2023 20:01:54 -0400
| From: Michael Meissner 
| Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633514.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2023-11-03 Thread Michael Meissner

Ping #2

| Date: Wed, 18 Oct 2023 20:04:44 -0400
| From: Michael Meissner 
| Subject: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA 
operations.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633515.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2023-11-03 Thread Michael Meissner

Ping #2

| Date: Wed, 18 Oct 2023 20:06:20 -0400
| From: Michael Meissner 
| Subject: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633516.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH] V6, #1 of 17: Use ADJUST_INSN_LENGTH for prefixed instructions

2019-10-23 Thread Michael Meissner

On Tue, Oct 22, 2019 at 05:27:19PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Oct 16, 2019 at 09:35:33AM -0400, Michael Meissner wrote:
> > This patch uses the target hook ADJUST_INSN_LENGTH to change the length of
> > instructions that contain prefixed memory/add instructions.
> 
> That made this amazingly hard to review.  But it might well be worth it,
> thankfully :-)
> 
> > There are 2 new insn attributes:
> > 
> > 1) num_insns: If non-zero, returns the number of machine instructions in an
> > insn.  This simplifies the calculations in rs6000_insn_cost.
> 
> This is great.
> 
> > 2) max_prefixed_insns: Returns the maximum number of prefixed instructions 
> > in
> > an insn.  Normally this is 1, but in the insns that load up 128-bit values 
> > into
> > GPRs, it will be 2.
> 
> This one, I am not so sure.

I wanted it to be simple, so in general it was just a constant.  Since the only
user of it has already checked that the insn is prefixed, I didn't think it
needed the prefixed test to set it to 0.

> > -  int n = get_attr_length (insn) / 4;
> > +  /* If the insn tells us how many insns there are, use that.  Otherwise 
> > use
> > + the length/4.  Adjust the insn length to remove the extra size that
> > + prefixed instructions take.  */
> 
> This should be temporary, until we have converted everything to use
> num_insns, right?

Well there were some 200+ places where length was set.

> > --- gcc/config/rs6000/rs6000.h  (revision 277017)
> > +++ gcc/config/rs6000/rs6000.h  (working copy)
> > @@ -1847,9 +1847,30 @@ extern scalar_int_mode rs6000_pmode;
> >  /* Adjust the length of an INSN.  LENGTH is the currently-computed length 
> > and
> > should be adjusted to reflect any required changes.  This macro is used 
> > when
> > there is some systematic length adjustment required that would be 
> > difficult
> > -   to express in the length attribute.  */
> > +   to express in the length attribute.
> >  
> > -/* #define ADJUST_INSN_LENGTH(X,LENGTH) */
> > +   In the PowerPC, we use this to adjust the length of an instruction if 
> > one or
> > +   more prefixed instructions are generated, using the attribute
> > +   num_prefixed_insns.  A prefixed instruction is 8 bytes instead of 4, 
> > but the
> > +   hardware requires that a prefied instruciton not cross a 64-byte 
> > boundary.
> 
> "prefixed instruction does not"

Thanks.

> > +   This means the compiler has to assume the length of the first prefixed
> > +   instruction is 12 bytes instead of 8 bytes.  Since the length is 
> > already set
> > +   for the non-prefixed instruction, we just need to udpate for the
> > +   difference.  */
> > +
> > +#define ADJUST_INSN_LENGTH(INSN,LENGTH)
> > \
> > +{  \
> > +  if (NONJUMP_INSN_P (INSN))   
> > \
> > +{  
> > \
> > +  rtx pattern = PATTERN (INSN);
> > \
> > +  if (GET_CODE (pattern) != USE && GET_CODE (pattern) != CLOBBER   
> > \
> > + && get_attr_prefixed (INSN) == PREFIXED_YES)  \
> > +   {   \
> > + int num_prefixed = get_attr_max_prefixed_insns (INSN);\
> > + (LENGTH) += 4 * (num_prefixed + 1);   \
> > +   }   \
> > +}  
> > \
> > +}
> 
> Please use a function, not a function-like macro.

Ok, I added rs6000_adjust_insn_length in rs6000.c.

> So this computes the *maximum* RTL instruction length, not considering how
> many of the machine insns in it need a prefix insn.  Can't we do better?
> Hrm, I guess in all cases that matter we will split early anyway.

Well before register allocation for the 128-bit types, you really can't say
what the precise length is, even if it is not prefixed.

And of course even after register allocation, it isn't precise, since the
length of a prefixed instruction is normally 8, but sometimes 12.  So we have
to use 12.

> 
> > +;; Return the number of real hardware instructions in a combined insn.  If 
> > it
> > +;; is 0, just use the length / 4.
> > +(define_attr "num_insns" "" (const_int 0))
> 
> S

Re: [PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns

2019-11-09 Thread Michael Meissner

On Fri, Nov 01, 2019 at 10:22:03PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Oct 16, 2019 at 09:47:41AM -0400, Michael Meissner wrote:
> > This patch fixes the stack protection insns to support stacks larger than
> > 16-bits on the 'future' system using prefixed loads and stores.
> 
> > +;; We can't use the prefixed attribute here because there are two memory
> > +;; instructions.  We can't split the insn due to the fact that this 
> > operation
> > +;; needs to be done in one piece.
> >  (define_insn "stack_protect_setdi"
> >[(set (match_operand:DI 0 "memory_operand" "=Y")
> > (unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
> > (set (match_scratch:DI 2 "=&r") (const_int 0))]
> >"TARGET_64BIT"
> > -  "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
> > +{
> > +  if (prefixed_memory (operands[1], DImode))
> > +output_asm_insn ("pld %2,%1", operands);
> > +  else
> > +output_asm_insn ("ld%U1%X1 %2,%1", operands);
> > +
> > +  if (prefixed_memory (operands[0], DImode))
> > +output_asm_insn ("pstd %2,%0", operands);
> > +  else
> > +output_asm_insn ("std%U0%X0 %2,%0", operands);
> 
> We could make %pN mean 'p' for prefixed, for memory as operands[N]?  Are
> there more places than this that could use that?  How about inline asm?

Right now, the only two places that do this are the two stack protect insns.
Everything else that I'm aware of that generates multiple loads or stores will
do a split before final.

> > +   (set (attr "length")
> > +   (cond [(and (match_operand 0 "prefixed_memory")
> > +   (match_operand 1 "prefixed_memory"))
> > +  (const_string "24")
> > +
> > +  (ior (match_operand 0 "prefixed_memory")
> > +   (match_operand 1 "prefixed_memory"))
> > +  (const_string "20")]
> > +
> > + (const_string "12")))])
> 
> You can use const_int instead of const_string here, I think?  Please do
> that if it works.

I'll try it out on Monday.

> Quite a simple expression, phew :-)
> 
> > +  if (which_alternative == 0)
> > +output_asm_insn ("xor. %3,%3,%4", operands);
> > +  else
> > +output_asm_insn ("cmpld %0,%3,%4\;li %3,0", operands);
> 
> That doesn't work: the backslash is treated like the escape character, in
> a C block.  I think doubling it will work?  Check the generated insn-output.c,
> it should be translated to \t\n in there.

Yes it does work.  I just checked.

> Okay for trunk with those things taken care of.  Thanks!

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns

2019-11-11 Thread Michael Meissner

On Fri, Nov 01, 2019 at 10:22:03PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Oct 16, 2019 at 09:47:41AM -0400, Michael Meissner wrote:
> > This patch fixes the stack protection insns to support stacks larger than
> > 16-bits on the 'future' system using prefixed loads and stores.
> 
> > +;; We can't use the prefixed attribute here because there are two memory
> > +;; instructions.  We can't split the insn due to the fact that this 
> > operation
> > +;; needs to be done in one piece.
> >  (define_insn "stack_protect_setdi"
> >[(set (match_operand:DI 0 "memory_operand" "=Y")
> > (unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
> > (set (match_scratch:DI 2 "=&r") (const_int 0))]
> >"TARGET_64BIT"
> > -  "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
> > +{
> > +  if (prefixed_memory (operands[1], DImode))
> > +output_asm_insn ("pld %2,%1", operands);
> > +  else
> > +output_asm_insn ("ld%U1%X1 %2,%1", operands);
> > +
> > +  if (prefixed_memory (operands[0], DImode))
> > +output_asm_insn ("pstd %2,%0", operands);
> > +  else
> > +output_asm_insn ("std%U0%X0 %2,%0", operands);
> 
> We could make %pN mean 'p' for prefixed, for memory as operands[N]?  Are
> there more places than this that could use that?  How about inline asm?

At the moment, I did not add this.  We can revisit it later.

> > +   (set (attr "length")
> > +   (cond [(and (match_operand 0 "prefixed_memory")
> > +   (match_operand 1 "prefixed_memory"))
> > +  (const_string "24")
> > +
> > +  (ior (match_operand 0 "prefixed_memory")
> > +   (match_operand 1 "prefixed_memory"))
> > +  (const_string "20")]
> > +
> > + (const_string "12")))])
> 
> You can use const_int instead of const_string here, I think?  Please do
> that if it works.
> 
> Quite a simple expression, phew :-)

Const_int works.

> > +  if (which_alternative == 0)
> > +output_asm_insn ("xor. %3,%3,%4", operands);
> > +  else
> > +output_asm_insn ("cmpld %0,%3,%4\;li %3,0", operands);
> 
> That doesn't work: the backslash is treated like the escape character, in
> a C block.  I think doubling it will work?  Check the generated insn-output.c,
> it should be translated to \t\n in there.
> 
> Okay for trunk with those things taken care of.  Thanks!

As we discussed, this does work.

Here is the patch committed.  I did a bootstrap and did make check.  There were
no regressions.

2019-11-11  Michael Meissner  

* config/rs6000/predicates.md (prefixed_memory): New predicate.
* config/rs6000/rs6000.md (stack_protect_setdi): Deal with either
address being a prefixed load/store.
(stack_protect_testdi): Deal with either address being a prefixed
load.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 278062)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1828,3 +1828,10 @@ (define_predicate "pcrel_external_addres
 (define_predicate "pcrel_local_or_external_address"
   (ior (match_operand 0 "pcrel_local_address")
(match_operand 0 "pcrel_external_address")))
+
+;; Return true if the operand is a memory address that uses a prefixed address.
+(define_predicate "prefixed_memory"
+  (match_code "mem")
+{
+  return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
+})
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 278062)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -11536,14 +11536,44 @@ (define_insn "stack_protect_setsi"
   [(set_attr "type" "three")
(set_attr "length" "12")])
 
+;; We can't use the prefixed attribute here because there are two memory
+;; instructions.  We can't split the insn due to the fact that this operation
+;; needs to be done in one piece.
 (define_insn "stack_protect_setdi"
   [(set (match_operand:DI 0 "memory_operand" "=Y")
(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
(set (match_scratch:DI 2 "=&r") (const_int 0))]
   "TARGET_64BIT"
-  "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
+{
+  if (prefixed_memory (operands[

PowerPC -mcpu=future Version 12 patches

2020-01-09 Thread Michael Meissner

This is version 12 of my patches for PowerPC -mcpu=future.  There are currently
14 patches.  Note, the PCREL_OPT patches are not part of this series.  I want
to concentrate on getting the other patches checked in.

Patches #1-4 reflect changes that were asked for in the previous (V11) set of
patches for patches V11 #2-#5.

Patch #5 is the same patch as V11 patch #6 that switches the default for
-mpcrel when the user uses -mcpu=future.

Patch #6 is the same patch as V11 patch #7 that adds new options for the
target-supports testcases.

The remaining patches (#7-14) are the same tests that were in V11 as patches
#8-15.

I have built these patches on a little endian power8 system and there were no
regressions in the test suite.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #1 of 14, add gcc_asserts for rs6000_adjust_vec_address

2020-01-09 Thread Michael Meissner

In https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01530.html, Segher asked me to
do the gcc_asserts as early as possible.

This patch makes sure the base register temporary is not used in the other
arguments.

I have built and bootstrapped a compiler on a little endian power8 system, and
there were no regressions in the test.  In addition, I compiled both Spec 2006
and Spec 2017 benchmarks with this compiler and I saw new build failures.  Can
I check this into the trunk?

2020-01-09  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add some
gcc_asserts.

--- /tmp/kXfaUP_rs6000.c2020-01-08 13:59:48.664454496 -0500
+++ gcc/config/rs6000/rs6000.c  2020-01-08 13:59:45.593410764 -0500
@@ -6772,6 +6772,9 @@ rs6000_adjust_vec_address (rtx scalar_re
   rtx new_addr;
   bool valid_addr_p;
 
+  gcc_assert (!reg_mentioned_p (base_tmp, addr));
+  gcc_assert (!reg_mentioned_p (base_tmp, element));
+
   /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY.  */
   gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
 
@@ -6781,6 +6784,10 @@ rs6000_adjust_vec_address (rtx scalar_re
 element_offset = GEN_INT (INTVAL (element) * scalar_size);
   else
 {
+  /* All insns should use the 'Q' constraint (address is a single register)
+if the element number is not a constant.  */
+  gcc_assert (REG_P (addr) || SUBREG_P (addr));
+
   int byte_shift = exact_log2 (scalar_size);
   gcc_assert (byte_shift >= 0);
 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #2 of 14, Refactor rs6000_adjust_vec_address & rs6000_split_vec_extract_var

2020-01-09 Thread Michael Meissner

In https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01530.html, Seghar had some
questions about that patch.

This patch addresses some of those concerns.

Instead of limiting the vector element number in rs6000_split_vec_extract_var
so that the memory access does not go out of bounds, I decided to move the
logic to rs6000_adjust_vec_address.  Rs6000_split_vec_extract_var is the only
caller of rs6000_adjust_vec_address that passes in a variable element number.

The function rs6000_adjust_vec_address has 3 parts:
  1) Calculation of the byte offset within the vector;
  2) Creation of the new vector address;
  3) Validating that the new address is valid for the register being loaded.

In this patch, I moved the code that calculates the byte offset to a separate
function, and moved in the AND that was originally done in
rs6000_split_vec_extract_var.

I have built and bootstrapped a compiler with this patch installed on a little
endian power8 system and there were no regressions in the test suite.  In
addition, I built -mcpu=future versions of Spec 2006 and Spec 2017, and there
were no additional failures.  Can I check this patch into the trunk?

2020-01-09  Michael Meissner  

* config/rs6000/rs6000.c (get_vector_offset): New helper function
to calculate the offset in memory from the start of a vector of a
particular element.  Add code to keep the element number in
bounces if the element number is variable.
(rs6000_adjust_vec_address): Move calculation of offset of the
vector element to get_vector_offset.
(rs6000_split_vec_extract_var): Do not do the initial AND of
element here, move the code to get_vector_offset.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 280071)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6753,6 +6753,43 @@ hard_reg_and_mode_to_addr_mask (rtx reg,
   return addr_mask;
 }
 
+/* Return the offset within a memory object (MEM) of a vector type to a given
+   element within the vector (ELEMENT) with an element size (SCALAR_SIZE).  If
+   the element is constant, we return a constant integer.  Otherwise, we use a
+   base register temporary to calculate the offset after making it to fit
+   within the vector and scaling it.  */
+
+static rtx
+get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size)
+{
+  if (CONST_INT_P (element))
+return GEN_INT (INTVAL (element) * scalar_size);
+
+  /* All insns should use the 'Q' constraint (address is a single register) if
+ the element number is not a constant.  */
+  rtx addr = XEXP (mem, 0);
+  gcc_assert (REG_P (addr) || SUBREG_P (addr));
+
+  /* Mask the element to make sure the element number is between 0 and the
+ maximum number of elements - 1 so that we don't generate an address
+ outside the vector.  */
+  rtx num_ele_m1 = GEN_INT (GET_MODE_NUNITS (GET_MODE (mem)) - 1);
+  rtx and_op = gen_rtx_AND (Pmode, element, num_ele_m1);
+  emit_insn (gen_rtx_SET (base_tmp, and_op));
+
+  /* Shift the element to get the byte offset from the element number.  */
+  int shift = exact_log2 (scalar_size);
+  gcc_assert (shift >= 0);
+
+  if (shift > 0)
+{
+  rtx shift_op = gen_rtx_ASHIFT (Pmode, base_tmp, GEN_INT (shift));
+  emit_insn (gen_rtx_SET (base_tmp, shift_op));
+}
+
+  return base_tmp;
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -6767,7 +6804,6 @@ rs6000_adjust_vec_address (rtx scalar_re
 {
   unsigned scalar_size = GET_MODE_SIZE (scalar_mode);
   rtx addr = XEXP (mem, 0);
-  rtx element_offset;
   rtx new_addr;
   bool valid_addr_p;
 
@@ -6779,30 +6815,7 @@ rs6000_adjust_vec_address (rtx scalar_re
 
   /* Calculate what we need to add to the address to get the element
  address.  */
-  if (CONST_INT_P (element))
-element_offset = GEN_INT (INTVAL (element) * scalar_size);
-  else
-{
-  /* All insns should use the 'Q' constraint (address is a single register)
-if the element number is not a constant.  */
-  gcc_assert (REG_P (addr) || SUBREG_P (addr));
-
-  int byte_shift = exact_log2 (scalar_size);
-  gcc_assert (byte_shift >= 0);
-
-  if (byte_shift == 0)
-   element_offset = element;
-
-  else
-   {
- if (TARGET_POWERPC64)
-   emit_insn (gen_ashldi3 (base_tmp, element, GEN_INT (byte_shift)));
- else
-   emit_insn (gen_ashlsi3 (base_tmp, element, GEN_INT (byte_shift)));
-
- element_offset = base_tmp;
-   }
-}
+  rtx element_offset = get_vector_offset (mem, element, base_tmp, scalar_size);
 
   /* Create the new address pointing to the element within the vector.  If we
  are adding 0, we don't have

[PATCH] V12 patch #3 of 14, Improve address validation in rs6000_adjust_vec_address

2020-01-09 Thread Michael Meissner

In the patches:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01530.html
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01533.html

Segher said the whole code was too complex.  This patch is my attempt to make
it somewhat easier to understand.

One part that is an issue was there was a section of code to tried to prevent
doing an ADDI if the register was GPR 0 (where the machine uses '0' instead of
the value in GPR 0).  I realized that if I changed the order of the adds, I
wouldn't have to worry about adding GPR 0.

For example consider:

#include 

double
indexed_get1 (vector double *vp, unsigned long m)
{
  return vec_extract (vp[m], 1);
}

Right now it generates:

sldi 4,4,4
addi 9,3,8
lfdx 1,4,9

I.e. add the offset to the base register and then form a X-FORM load with the
base and index registers.

With this patch, it now generates:

sldi 4,4,4
add 9,4,3
lfd 1,8(9)

I.e. add the base and index registers to the temporary, and a D-FORM load
(assuming the element number is constant) instead of a X-FORM load with the
offset as the index.

The second part of cleaning up the code was to eliminate the special purpose
code that checks the addr_masks for the register type along with the code that
assumed all 8-byte values needed a DS-FORM instruction.

Instead I now call address_to_insn_form, which is the general address
classification function added recently.  That function peers into the
addr_masks, etc. but it means this function at a higher abstraction layer
doens't have to worry about the details.

This patch does eliminate the hard_reg_and_mode_to_addr_mask function that I
added recently in anticipation of using to optimize PC-relative addresses as
well.  When I started looking at it, I figured it simplified things if I could
push all of the details to address_to_insn_form (which already knew about these
things).

As with the other patches, I have built and boostrapped a compiler on a little
endian power8 system, and there were no regressions in the tests.  Can I check
this patch into the trunk?

2020-01-09  Michael Meissner  

* config/rs6000/rs6000.c (reg_to_non_prefixed): Add forward
reference.
(hard_reg_and_mode_to_addr_mask): Delete, no longer used.
(rs6000_adjust_vec_address): If the original vector address
was REG+REG or REG+OFFSET and the element is not zero, do the add
of the elements in the original address before adding the offset
for the vector element.  Use address_to_insn_form to validate the
address using the register being loaded, rather than guessing
whether the address is a DS-FORM or DQ-FORM address.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 280072)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1172,6 +1172,7 @@ static bool rs6000_secondary_reload_move
  machine_mode,
  secondary_reload_info *,
  bool);
+static enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 rtl_opt_pass *make_pass_analyze_swaps (gcc::context*);
 
 /* Hash table stuff for keeping track of TOC entries.  */
@@ -6729,30 +6730,6 @@ rs6000_expand_vector_extract (rtx target
 }
 }
 
-/* Helper function to return an address mask based on a physical register.  */
-
-static addr_mask_type
-hard_reg_and_mode_to_addr_mask (rtx reg, machine_mode mode)
-{
-  unsigned int r = reg_or_subregno (reg);
-  addr_mask_type addr_mask;
-
-  gcc_assert (HARD_REGISTER_NUM_P (r));
-  if (INT_REGNO_P (r))
-addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR];
-
-  else if (FP_REGNO_P (r))
-addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR];
-
-  else if (ALTIVEC_REGNO_P (r))
-addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX];
-
-  else
-gcc_unreachable ();
-
-  return addr_mask;
-}
-
 /* Return the offset within a memory object (MEM) of a vector type to a given
element within the vector (ELEMENT) with an element size (SCALAR_SIZE).  If
the element is constant, we return a constant integer.  Otherwise, we use a
@@ -6805,7 +6782,6 @@ rs6000_adjust_vec_address (rtx scalar_re
   unsigned scalar_size = GET_MODE_SIZE (scalar_mode);
   rtx addr = XEXP (mem, 0);
   rtx new_addr;
-  bool valid_addr_p;
 
   gcc_assert (!reg_mentioned_p (base_tmp, addr));
   gcc_assert (!reg_mentioned_p (base_tmp, element));
@@ -6833,68 +6809,30 @@ rs6000_adjust_vec_address (rtx scalar_re
 {
   rtx op0 = XEXP (addr, 0);
   rtx op1 = XEXP (addr, 1);
-  rtx insn;
 
   gcc_assert (REG_P (op0) || SUBREG_P (op0));
   if (CONST_INT_P (op1) && CONST_INT_P (element_offset))
{
+ /* D-FORM address with constant element number.  */
  HOST_WIDE_INT of

[PATCH] V12 patch #4 of 14, Optimize adjusting PC-relative vector addresses

2020-01-09 Thread Michael Meissner

This patch folds a PC-relative vector address that is adjusted with a constant
offset, to fold the constant into the PC-relative address.  I moved this code
to be a separate function to make it clearer what the steps were.  With patch
V12 #3, address_to_insn_form is now used to validate the address, so we don't
need any new special address validation.

I have build and bootstrapped a compiler on a little endian power8 system, and
there were no regressions in the test suite.  Can I check this in to the trunk.
Patch V12 #13 will contain new tests for this optimization.

2020-01-09  Michael Meissner  

* config/rs6000/rs6000.c (adjust_vec_address_pcrel): New helper
function to adjust PC-relative vector addresses.
(rs6000_adjust_vec_address): Call adjust_vec_address_pcrel to
handle vectors with PC-relative addresses.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 280073)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6767,6 +6767,60 @@ get_vector_offset (rtx mem, rtx element,
   return base_tmp;
 }
 
+/* Helper function update PC-relative addresses when we are adjusting a memory
+   address (ADDR) to a vector to point to a scalar field within the vector with
+   a constant offset (ELEMENT_OFFSET).  If the address is not valid, we can
+   use the base register temporary (BASE_TMP) to form the address.  */
+
+static rtx
+adjust_vec_address_pcrel (rtx addr, rtx element_offset, rtx base_tmp)
+{
+  rtx new_addr = NULL;
+
+  gcc_assert (CONST_INT_P (element_offset));
+
+  if (GET_CODE (addr) == CONST)
+addr = XEXP (addr, 0);
+
+  if (GET_CODE (addr) == PLUS)
+{
+  rtx op0 = XEXP (addr, 0);
+  rtx op1 = XEXP (addr, 1);
+
+  if (CONST_INT_P (op1))
+   {
+ HOST_WIDE_INT offset
+   = INTVAL (XEXP (addr, 1)) + INTVAL (element_offset);
+
+ if (offset == 0)
+   new_addr = op0;
+
+ else
+   {
+ rtx plus = gen_rtx_PLUS (Pmode, op0, GEN_INT (offset));
+ new_addr = gen_rtx_CONST (Pmode, plus);
+   }
+   }
+
+  else
+   {
+ emit_move_insn (base_tmp, addr);
+ new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
+   }
+}
+
+  else if (SYMBOL_REF_P (addr) || LABEL_REF_P (addr))
+{
+  rtx plus = gen_rtx_PLUS (Pmode, addr, element_offset);
+  new_addr = gen_rtx_CONST (Pmode, plus);
+}
+
+  else
+gcc_unreachable ();
+
+  return new_addr;
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -6803,6 +6857,11 @@ rs6000_adjust_vec_address (rtx scalar_re
   else if (REG_P (addr) || SUBREG_P (addr))
 new_addr = gen_rtx_PLUS (Pmode, addr, element_offset);
 
+  /* For references to local static variables, fold a constant offset into the
+ address.  */
+  else if (pcrel_local_address (addr, Pmode) && CONST_INT_P (element_offset))
+new_addr = adjust_vec_address_pcrel (addr, element_offset, base_tmp);
+
   /* Optimize D-FORM addresses with constant offset with a constant element, to
  include the element offset in the address directly.  */
   else if (GET_CODE (addr) == PLUS)


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #5 of 14, Make -mpcrel default for -mcpu=future on little endian Linux 64-bit systems

2020-01-09 Thread Michael Meissner

This patch is the same as patch V11 #6:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01494.html

Assuming patches 1-4 are applied, it fixes all of the known codegen bugs with
the -mcpu=future support, and so it is time to make -mpcrel default on the one
system that will support PC-relative addressing on the future system.

I have built and bootstrapped a compiler with this patch on a little endian
power8 system, and there were no regressions in the testsuite.  I have built
Spec 2006 and Spec 2017 benchmarks with this patch, and there were no
regressions in building the benchmarks.  Can I check this patch into the trunk?

2020-01-09  Michael Meissner  

* config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to
1 to enable prefixed addressing if -mcpu=future.
(PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing
if -mcpu=future.
* config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not
enable -mprefixed-addr or -mpcrel by default.
(ADDRESSING_FUTURE_MASKS): New macro.
(OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS.
* config/rs6000/rs6000.c (PREFIXED_ADDR_SUPPORTED_BY_OS): Disable
prefixed addressing unless the target OS tm.h says we should
enable it.
(PCREL_SUPPORTED_BY_OS): Disable PC-relative addressing unless the
target OS tm.h says we should enable it.
(rs6000_debug_reg_global): Print whether prefixed addressing and
PC-relative addressing is enabled by default if -mcpu=future.
(rs6000_option_override_internal): Move setting prefixed
addressing and PC-relative addressing after the sub-target option
handling is done.  Only enable prefixed addressing or PC-relative
address on -mcpu=future system if the target OS says to enable
it.  Disallow prefixed addressing on 32-bit systems or if the
target object file is not ELF v2.

Index: gcc/config/rs6000/linux64.h
===
--- gcc/config/rs6000/linux64.h (revision 280069)
+++ gcc/config/rs6000/linux64.h (working copy)
@@ -640,3 +640,11 @@ extern int dot_symbols;
enabling the __float128 keyword.  */
 #undef TARGET_FLOAT128_ENABLE_TYPE
 #define TARGET_FLOAT128_ENABLE_TYPE 1
+
+/* Enable support for pc-relative and numeric prefixed addressing on the
+   'future' system.  */
+#undef  PREFIXED_ADDR_SUPPORTED_BY_OS
+#define PREFIXED_ADDR_SUPPORTED_BY_OS  1
+
+#undef  PCREL_SUPPORTED_BY_OS
+#define PCREL_SUPPORTED_BY_OS  1
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 280069)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -75,15 +75,22 @@
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_P9_VECTOR)
 
-/* Support for a future processor's features.  Do not enable -mpcrel until it
-   is fully functional.  */
+/* Support for a future processor's features.  The prefixed and pc-relative
+   addressing bits are not added here.  Instead, they are added if the target
+   OS tm.h says that it supports the addressing modes by default when
+   -mcpu=future is used.  */
 #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER   
\
-| OPTION_MASK_FUTURE   \
+| OPTION_MASK_FUTURE)
+
+/* Addressing related flags on a future processor.  These are options that need
+   to be cleared if the target OS is not capable of supporting prefixed
+   addressing at all (such as 32-bit mode or if the object file format is not
+   ELF v2).  */
+#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL  
\
 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL  \
-| OPTION_MASK_PREFIXED_ADDR)
+#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 280074)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -98,6 +98,16 @@
 #endif
 #endif
 
+/* Set up the defaults for whether prefixed addressing is used, and if it is
+   used, whether we want to turn on pc-relative support by default.  */
+#ifndef PREFIXED_ADDR_SUPPORTED_BY_OS
+#define PREFIXED_ADDR_SUPPORTED_BY_OS  0
+#endif
+
+#ifndef PCREL_SUPPORTED_BY_OS
+#define PCREL_SUPPORTED_BY_OS  0
+#endif
+
 /* Support targetm.vectorize.builtin_mask_for_load.  */
 GTY(()) tree altivec_builtin_mask_for_load;

[PATCH] V12 patch #6 of 14, Add -mcpu=future target-supports options

2020-01-09 Thread Michael Meissner

This patch is the same as V11, #7:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01495.html

This patch adds the necessary options to target-supports.exp to enable the
specific target supports for -mcpu=future.  It contains changes that you asked
for some time ago.  Can I check this into the trunk?

2020-01-09  Michael Meissner  

* lib/target-supports.exp (check_effective_target_powerpc_pcrel):
New target for PowerPC -mcpu=future support.
(check_effective_target_powerpc_prefixed_addr): New target for
PowerPC -mcpu=future support.

Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 280069)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -2161,6 +2161,23 @@ proc check_p9modulo_hw_available { } {
 }]
 }
 
+# Return 1 if the target generates PC-relative instructions automatically
+proc check_effective_target_powerpc_pcrel { } {
+return [check_no_messages_and_pattern powerpc_pcrel \
+   {\mpld\M.*[@]pcrel} assembly {
+   static long s;
+   long *p = &s;
+   long foo (void) { return s; }
+   } {-O2 -mcpu=future}]
+}
+
+# Return 1 if the target generates prefixed instructions automatically
+proc check_effective_target_powerpc_prefixed_addr { } {
+return [check_no_messages_and_pattern powerpc_prefixed_addr \
+   {\mpld\M} assembly {
+   long foo (long *p) { return p[0x12345]; }
+   } {-O2 -mcpu=future}]
+}
 
 # Return 1 if the target supports executing FUTURE instructions, 0 otherwise.
 # Cache the result.  It is assumed that if a simulator does not support the

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #7 of 14, Add PADDI/PLI tests

2020-01-09 Thread Michael Meissner

This patch adds new tests for the compiler generating PLI or PADDI with large
constants when -mcpu=future is used.  It renames the files as you requested
several patch generations ago so the -fident option doesn't give a false
positive result.  Can I check this patch into the trunk?

2020-01-09  Michael Meissner  

* gcc.target/powerpc/prefix-add.c: New test for -mcpu=future
generating PADDI for large constant adds.
* gcc.target/powerpc/prefix-di-constant.c: New test for
-mcpu=future generating PLI to load up large DImode constants.
* gcc.target/powerpc/prefix-si-constant.c: New test for
-mcpu=future generating PLI to load up large SImode constants.

Index: gcc/testsuite/gcc.target/powerpc/prefix-add.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-add.c   (revision 280078)
+++ gcc/testsuite/gcc.target/powerpc/prefix-add.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PADDI is generated to add a large constant.  */
+unsigned long
+add (unsigned long a)
+{
+  return a + 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpaddi\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c   (revision 
280078)
+++ gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant.  */
+unsigned long
+large (void)
+{
+  return 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c   (revision 
280078)
+++ gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant for SImode.  */
+void
+large_si (unsigned int *p)
+{
+  *p = 0x12345U;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #8 of 14, Add test to verify prefixed instruction is generated for -mcpu=future for DS/DS illegal offsets

2020-01-09 Thread Michael Meissner

This patch is the same as:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01497.html

It adds a test to validate that the compiler will now generate a prefixed load
or store instead of loading up an offset that would be illegal for DS/DQ-FORM
instructions.  Can I check this into the trunk?

2020-01-09  Michael Meissner  

* gcc.target/powerpc/prefix-ds-dq.c: New test to verify that we
generate the prefix load/store instructions for traditional
instructions with an offset that doesn't match DS/DQ
requirements.

Index: gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (revision 280080)
+++ gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (working copy)
@@ -0,0 +1,156 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests whether we generate a prefixed load/store operation for addresses that
+   don't meet DS/DQ offset constraints.  */
+
+unsigned long
+load_uc_offset1 (unsigned char *p)
+{
+  return p[1]; /* should generate LBZ.  */
+}
+
+long
+load_sc_offset1 (signed char *p)
+{
+  return p[1]; /* should generate LBZ + EXTSB.  */
+}
+
+unsigned long
+load_us_offset1 (unsigned char *p)
+{
+  return *(unsigned short *)(p + 1);   /* should generate LHZ.  */
+}
+
+long
+load_ss_offset1 (unsigned char *p)
+{
+  return *(short *)(p + 1);/* should generate LHA.  */
+}
+
+unsigned long
+load_ui_offset1 (unsigned char *p)
+{
+  return *(unsigned int *)(p + 1); /* should generate LWZ.  */
+}
+
+long
+load_si_offset1 (unsigned char *p)
+{
+  return *(int *)(p + 1);  /* should generate PLWA.  */
+}
+
+unsigned long
+load_ul_offset1 (unsigned char *p)
+{
+  return *(unsigned long *)(p + 1);/* should generate PLD.  */
+}
+
+long
+load_sl_offset1 (unsigned char *p)
+{
+  return *(long *)(p + 1); /* should generate PLD.  */
+}
+
+float
+load_float_offset1 (unsigned char *p)
+{
+  return *(float *)(p + 1);/* should generate LFS.  */
+}
+
+double
+load_double_offset1 (unsigned char *p)
+{
+  return *(double *)(p + 1);   /* should generate LFD.  */
+}
+
+__float128
+load_float128_offset1 (unsigned char *p)
+{
+  return *(__float128 *)(p + 1);   /* should generate PLXV.  */
+}
+
+void
+store_uc_offset1 (unsigned char uc, unsigned char *p)
+{
+  p[1] = uc;   /* should generate STB.  */
+}
+
+void
+store_sc_offset1 (signed char sc, signed char *p)
+{
+  p[1] = sc;   /* should generate STB.  */
+}
+
+void
+store_us_offset1 (unsigned short us, unsigned char *p)
+{
+  *(unsigned short *)(p + 1) = us; /* should generate STH.  */
+}
+
+void
+store_ss_offset1 (signed short ss, unsigned char *p)
+{
+  *(signed short *)(p + 1) = ss;   /* should generate STH.  */
+}
+
+void
+store_ui_offset1 (unsigned int ui, unsigned char *p)
+{
+  *(unsigned int *)(p + 1) = ui;   /* should generate STW.  */
+}
+
+void
+store_si_offset1 (signed int si, unsigned char *p)
+{
+  *(signed int *)(p + 1) = si; /* should generate STW.  */
+}
+
+void
+store_ul_offset1 (unsigned long ul, unsigned char *p)
+{
+  *(unsigned long *)(p + 1) = ul;  /* should generate PSTD.  */
+}
+
+void
+store_sl_offset1 (signed long sl, unsigned char *p)
+{
+  *(signed long *)(p + 1) = sl;/* should generate PSTD.  */
+}
+
+void
+store_float_offset1 (float f, unsigned char *p)
+{
+  *(float *)(p + 1) = f;   /* should generate STF.  */
+}
+
+void
+store_double_offset1 (double d, unsigned char *p)
+{
+  *(double *)(p + 1) = d;  /* should generate STD.  */
+}
+
+void
+store_float128_offset1 (__float128 f128, unsigned char *p)
+{
+  *(__float128 *)(p + 1) = f128;   /* should generate PSTXV.  */
+}
+
+/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlbz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mlfd\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlfs\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlha\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlhz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlwz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mplwa\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mplxv\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstb\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstfd\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mstfs\M}  1 } } */
+/* { dg-final { scan-assembler-times {\msth\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstw\M}   2 } } */

-- 
M

[PATCH] V12 patch #9 of 14, Add test to validate we don't generate an illegal prefixed instruction

2020-01-09 Thread Michael Meissner

This patch is the same as:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01499.html

It adds a new test to make sure if we are using a prefixed load or store
instruction, the compiler does not try to use a load with update or store with
update version of the isntruction, since there are no prefixed version of those
instructions.  Can I check this into the trunk?

2020-01-09  Michael Meissner  

* gcc.target/powerpc/prefix-no-premodify.c: Make sure we do not
generate the non-existent PLWZU instruction if -mcpu=future.

Index: gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c  (revision 
280082)
+++ gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c  (working copy)
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Make sure that we don't generate a prefixed form of the load and store with
+   update instructions (i.e. instead of generating LWZU we have to generate
+   PLWZ plus a PADDI).  */
+
+#ifndef SIZE
+#define SIZE 5
+#endif
+
+struct foo {
+  unsigned int field;
+  char pad[SIZE];
+};
+
+struct foo *inc_load (struct foo *p, unsigned int *q)
+{
+  *q = (++p)->field;   /* PLWZ, PADDI, STW.  */
+  return p;
+}
+
+struct foo *dec_load (struct foo *p, unsigned int *q)
+{
+  *q = (--p)->field;   /* PLWZ, PADDI, STW.  */
+  return p;
+}
+
+struct foo *inc_store (struct foo *p, unsigned int *q)
+{
+  (++p)->field = *q;   /* LWZ, PADDI, PSTW.  */
+  return p;
+}
+
+struct foo *dec_store (struct foo *p, unsigned int *q)
+{
+  (--p)->field = *q;   /* LWZ, PADDI, PSTW.  */
+  return p;
+}
+
+/* { dg-final { scan-assembler-times {\mlwz\M}2 } } */
+/* { dg-final { scan-assembler-times {\mstw\M}2 } } */
+/* { dg-final { scan-assembler-times {\mpaddi\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mplwz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}   2 } } */
+/* { dg-final { scan-assembler-not   {\mplwzu\M}} } */
+/* { dg-final { scan-assembler-not   {\mpstwu\M}} } */
+/* { dg-final { scan-assembler-not   {\maddis\M}} } */
+/* { dg-final { scan-assembler-not   {\maddi\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #10 of 14, Add tests for generating prefixed load/store instructions with large numeric offsets

2020-01-09 Thread Michael Meissner

This patch is the same as:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01500.html

This patch adds one test per type validating that we generate the appropriate
prefixed instructions to load/store the type when the offset if large.  Can I
check this into the trunk?

2020-01-09  Michael Meissner  

* gcc.target/powerpc/prefix-large.h: New set of tests to test
prefixed addressing on 'future' system with large numeric offsets
for various types.
* gcc.target/powerpc/prefix-large-dd.c: New test for prefixed
loads/stores with large offsets for the _Decimal64 type.
* gcc.target/powerpc/prefix-large-df.c: New test for prefixed
loads/stores with large offsets for the double type.
* gcc.target/powerpc/prefix-large-di.c: New test for prefixed
loads/stores with large offsets for the long type.
* gcc.target/powerpc/prefix-large-hi.c: New test for prefixed
loads/stores with large offsets for the short type.
* gcc.target/powerpc/prefix-large-kf.c: New test for prefixed
loads/stores with large offsets for the __float128 type.
* gcc.target/powerpc/prefix-large-qi.c: New test for prefixed
loads/stores with large offsets for the signed char type.
* gcc.target/powerpc/prefix-large-sd.c: New test for prefixed
loads/stores with large offsets for the _Decimal32 type.
* gcc.target/powerpc/prefix-large-sf.c: New test for prefixed
loads/stores with large offsets for the float type.
* gcc.target/powerpc/prefix-large-si.c: New test for prefixed
loads/stores with large offsets for the int type.
* gcc.target/powerpc/prefix-large-udi.c: New test for prefixed
loads/stores with large offsets for the unsigned long type.
* gcc.target/powerpc/prefix-large-uhi.c: New test for prefixed
loads/stores with large offsets for the unsigned short type.
* gcc.target/powerpc/prefix-large-uqi.c: New test for prefixed
loads/stores with large offsets for the unsigned char type.
* gcc.target/powerpc/prefix-large-usi.c: New test for prefixed
loads/stores with large offsets for the unsigned int type.
* gcc.target/powerpc/prefix-large-v2df.c: New test for prefixed
loads/stores with large offsets for the vector double type.

Index: gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c  (revision 280083)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset for _Decimal64 objects.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-df.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-df.c  (revision 280083)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset for double objects.  */
+
+#define TYPE double
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-di.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-di.c  (revision 280083)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-di.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset for long objects.  */
+
+#define TYPE long
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c  (revision 280083)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */

[PATCH] V12 patch #11 of 14, Add tests for using PC-relative instructions with -mcpu=future

2020-01-09 Thread Michael Meissner

This patch is the same as:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01501.html

This patch adds a set of tests for each type to verify that the appropriate
PC-relative instructions are generated when -mcpu=future is used.  Can I check
this patch into the trunk?

2020-01-09  Michael Meissner  

* gcc.target/powerpc/prefix-pcrel.h: New set of tests to test
prefixed addressing on 'future' system with PC-relative addresses
for various types.
* gcc.target/powerpc/prefix-pcrel-dd.c: New test for prefixed
loads/stores with PC-relative addresses for the _Decimal64 type.
* gcc.target/powerpc/prefix-pcrel-df.c: New test for prefixed
loads/stores with PC-relative addresses for the double type.
* gcc.target/powerpc/prefix-pcrel-di.c: New test for prefixed
loads/stores with PC-relative addresses for the long type.
* gcc.target/powerpc/prefix-pcrel-hi.c: New test for prefixed
loads/stores with PC-relative addresses for the short type.
* gcc.target/powerpc/prefix-pcrel-kf.c: New test for prefixed
loads/stores with PC-relative addresses for the __float128 type.
* gcc.target/powerpc/prefix-pcrel-qi.c: New test for prefixed
loads/stores with PC-relative addresses for the signed char type.
* gcc.target/powerpc/prefix-pcrel-sd.c: New test for prefixed
loads/stores with PC-relative addresses for the _Decimal32 type.
* gcc.target/powerpc/prefix-pcrel-sf.c: New test for prefixed
loads/stores with PC-relative addresses for the float type.
* gcc.target/powerpc/prefix-pcrel-si.c: New test for prefixed
loads/stores with PC-relative addresses for the int type.
* gcc.target/powerpc/prefix-pcrel-udi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned long
type.
* gcc.target/powerpc/prefix-pcrel-uhi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned short
type.
* gcc.target/powerpc/prefix-pcrel-uqi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned char
type.
* gcc.target/powerpc/prefix-pcrel-usi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned int
type.
* gcc.target/powerpc/prefix-pcrel-v2df.c: New test for prefixed
loads/stores with PC-relative addresses for the vector double
type.

Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c  (revision 280086)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c  (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for the _Decimal64 type.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  4 } } */
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c  (revision 280086)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c  (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for the double type.  */
+
+#define TYPE double
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  4 } } */
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c  (revision 280086)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c  (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for the long type.  */
+
+#define TYPE long
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {[@]pcrel} 4 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c
==

[PATCH] V12 patch #12 of 14, Add test for -fstack-protect-strong with large stack sizes and -mcpu=future

2020-01-09 Thread Michael Meissner

This patch is the same as:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01503.html

This patch adds a new test to test that -fstack-protect-strong generates the
correct code when a large stack is used and the compiler option -mcpu=future is
also used.  Can I check this into the trunk?

This is a bug that we discovered when we attempted to build glibc using the
-mcpu=future option.

2020-01-09  Michael Meissner  

* gcc.target/powerpc/prefix-stack-protect.c: New test to make sure
-fstack-protect-strong works with prefixed addressing.

Index: gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (revision 
280088)
+++ gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */
+
+/* Test that we can handle large stack frames with -fstack-protector-strong and
+   prefixed addressing.  This was originally discovered in trying to build
+   glibc with -mcpu=future, and vfwprintf.c failed because it used
+   -fstack-protector-strong.  */
+
+extern long foo (char *);
+
+long
+bar (void)
+{
+  char buffer[0x2];
+  return foo (buffer) + 1;
+}
+
+/* { dg-final { scan-assembler {\mpld\M}  } } */
+/* { dg-final { scan-assembler {\mpstd\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #13 of 14, Add tests for vec_extract with PC-relative addresses

2020-01-09 Thread Michael Meissner

This patch is the same as:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01504.html

This patch adds some tests to validate the work in patches V12 #1-4 generate
the correct code with vec_extract is used with a vector with a PC-relative
address and -mcpu=future is used.  Can I check this into the trunk?

2020-01-09  Michael Meissner  

* gcc.target/powerpc/vec-extract-pcrel-si.c: New test for
vec_extract from a PC-relative address.
* gcc.target/powerpc/vec-extract-pcrel-di.c: New test for
vec_extract from a PC-relative address.
* gcc.target/powerpc/vec-extract-pcrel-sf.c: New test for
vec_extract from a PC-relative address.
* gcc.target/powerpc/vec-extract-pcrel-df.c: New test for
vec_extract from a PC-relative address.

Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (revision 
280090)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V2DF vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE double
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (revision 
280090)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V2DI vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned long
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (revision 
280090)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V4SF vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE float
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mplfs\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (revision 
280090)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V4SI vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned int
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mplwz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V12 patch #14 of 14, Add tests for generating prefixed instructions when using vec_extract with large offsets with -mcpu=future

2020-01-09 Thread Michael Meissner

While this patch is similar in spirit to V11 #15, I lost that patch, and I
re-implemented the check.  Can I check this test into the trunk?

2020-01-09  Michael Meissner  

* gcc.target/powerpc/vec-extract-large-si.c: New test for
vec_extract from a vector unsigned int in memory with a large
offset.
* gcc.target/powerpc/vec-extract-large-di.c: New test for
vec_extract from a vector long in memory with a large offset.
* gcc.target/powerpc/vec-extract-large-sf.c: New test for
vec_extract from a vector float in memory with a large offset.
* gcc.target/powerpc/vec-extract-large-df.c: New test for
vec_extract from a vector double in memory with a large offset.

Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (revision 
280092)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V2DF vectors with a large numeric
+   offset address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE double
+#endif
+
+#ifndef OFFSET
+#define OFFSET 0x12345
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 0);
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 1);
+}
+
+TYPE
+getn (vector TYPE *p, unsigned long n)
+{
+  return vec_extract (p[OFFSET], n);
+}
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (revision 
280092)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V2DI vectors with a large numeric
+   offset address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned long long
+#endif
+
+#ifndef OFFSET
+#define OFFSET 0x12345
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 0);
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 1);
+}
+
+TYPE
+getn (vector TYPE *p, unsigned long n)
+{
+  return vec_extract (p[OFFSET], n);
+}
+
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (revision 
280092)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V4SF vectors with a large numeric
+   offset address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE float
+#endif
+
+#ifndef OFFSET
+#define OFFSET 0x12345
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 0);
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 1);
+}
+
+TYPE
+getn (vector TYPE *p, unsigned long n)
+{
+  return vec_extract (p[OFFSET], n);
+}
+
+/* { dg-final { scan-assembler-times {\mplfs\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (revision 
280092)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V4SI vectors with a large numeric
+   offset address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned int
+#endif
+
+#ifndef OFFSET
+#define OFFSET 0x12345
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 0);
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[OFFSET], 1);
+}
+
+TYPE
+getn (vector TYPE *p, unsigned long n)
+{
+  return vec_extract (p[OFFSET], n);
+}
+
+/* { dg-final { scan-assembler-times {\mplwz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpaddi\M} 1 } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V12 patch #5 of 14, Make -mpcrel default for -mcpu=future on little endian Linux 64-bit systems

2020-02-03 Thread Michael Meissner

On Fri, Jan 31, 2020 at 07:12:53PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jan 09, 2020 at 07:40:08PM -0500, Michael Meissner wrote:
> > * config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to
> > 1 to enable prefixed addressing if -mcpu=future.
> > (PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing
> > if -mcpu=future.
> > * config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not
> > enable -mprefixed-addr or -mpcrel by default.
> 
> I understand why this is needed for pcrel (or useful at least), but why
> for prefixed addressing in general as well?  What OS support is needed
> for that?
> 
> Put another way, is this just carefulness, or do you run into actual
> problems without it?

Just caution.  I can just do the PCREL.

> > +/* Enable support for pc-relative and numeric prefixed addressing on the
> > +   'future' system.  */
> > +#undef  PREFIXED_ADDR_SUPPORTED_BY_OS
> > +#define PREFIXED_ADDR_SUPPORTED_BY_OS  1
> > +
> > +#undef  PCREL_SUPPORTED_BY_OS
> > +#define PCREL_SUPPORTED_BY_OS  1
> 
> "Numeric prefixed addressing"?  What's that?  Just "and other prefixed
> addressing", maybe?

Using a prefixed address with a large offset, or using a small offset because
the traditional instruction is a DS/DQ instruction and the bottom 2/4 bits are
non-zero.

> (Is it useful to have those two separate at all, btw?  Now, that is while
> we are still developing the code, but also in the future?)
> 
> > +/* Addressing related flags on a future processor.  These are options that 
> > need
> > +   to be cleared if the target OS is not capable of supporting prefixed
> > +   addressing at all (such as 32-bit mode or if the object file format is 
> > not
> > +   ELF v2).  */
> 
> Ah.  If we are missing the needed relocations (or other as/ld support).
> So it is not about OS really, missing toolchain support instead?

It also plays into the dynamic loader of the system.  If the dynamic loader
doesn't support the new relocations, you can't do PCREL.

> 
> > +  /* Only ELFv2 currently supports prefixed/pcrel addressing.  */
> > +  else if (rs6000_current_abi != ABI_ELFv2)
> > +   {
> > + if (TARGET_PCREL && explicit_pcrel)
> > +   error ("%qs requires %qs", "-mpcrel", "-mabi=elfv2");
> > +
> > + else if (TARGET_PREFIXED_ADDR && explicit_prefixed)
> > +   error ("%qs requires %qs", "-mprefixed-addr", "-mabi=elfv2");
> 
> It would be good if the error messages also said "currently" somehow (it
> is not an actual limitation, it's just a matter of code).  "Is only
> supported with -mabi=elfv2", perhaps?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V12 patch #2 of 14, Refactor rs6000_adjust_vec_address & rs6000_split_vec_extract_var

2020-02-03 Thread Michael Meissner

On Fri, Jan 31, 2020 at 11:30:22AM -0600, Segher Boessenkool wrote:
> But why is that the correct thing to do?  Garbage in, garbage out is
> perfectly fine?  Or do we have (e.g.) builtins that specify this masking?
> If so, please say that here.

It has been this way since I added these for power7 or power8, so I'm not
changing the semantics here.

Quoting from the LE abi:

VEC_EXTRACT (ARG1, ARG2)

This function uses modular arithmetic on ARG2 to determine the element number. 
For
example, if ARG2 is out of range, the compiler uses ARG2 modulo the number of 
elements
in the vector to determine the element position.

So if we were to remove the ANDing, we would have to change the ABI.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V12 patch #3 of 14, Improve address validation in rs6000_adjust_vec_address

2020-02-03 Thread Michael Meissner

On Fri, Jan 31, 2020 at 05:43:20PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jan 09, 2020 at 07:27:58PM -0500, Michael Meissner wrote:
> > * config/rs6000/rs6000.c (reg_to_non_prefixed): Add forward
> > reference.
> 
> FWIW, it is better to just reorder the code, in most cases.
> 
> > (hard_reg_and_mode_to_addr_mask): Delete, no longer used.
> 
> Just "Delete.".  Changelogs say what, not why; you have the commit
> message for that.
> 
> > + new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
> 
> So this depends on op0 not being r0 here.  Do we guarantee that somehow?
> It isn't obvious, so add an assert for this please?  (Or do I miss
> something obvious?  :-) )

That particular code is inside if CONST_INT_P (op1).  Therefore, op0 cannot be
r0, but I can add an assertion.

> > +/* If the address isn't valid, move the address into the temporary base
> > +   register.  Some reasons it could not be valid include:
> > +   The address offset overflowed the 16 or 34 bit offset size;
> > +   We need to use a DS-FORM load, and the bottom 2 bits are non-zero;
> > +   We need to use a DQ-FORM load, and the bottom 2 bits are non-zero;
> > +   Only X_FORM loads can be done, and the address is D_FORM.  */
> 
> 4 bits for DQ-form?
> 
> Okay for trunk with those tweaks.  Thanks!
> 
> 
> Segher

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] Fix PR 93568 on PowerPC (vector extract failures)

2020-02-05 Thread Michael Meissner

When I submitted my recent patches, in updating one of the patches, I made a
thinko that resulted in a lot of failures on big endian systems (but not as
many on the little endian systems).

I have done bootstraps on both big endian and little endian systems.  Can I
check in this patch?

On a big endian power8 system, the following tests now pass:

gcc.target/powerpc/pr87532-mc.c
gcc.target/powerpc/pr89765-mc.c
gcc.target/powerpc/vec-extract-3.c
gcc.target/powerpc/vec-extract-5.c
gcc.target/powerpc/vec-extract-6.c
gcc.target/powerpc/vec-extract-7.c
gcc.target/powerpc/vec-extract-8.c
gcc.target/powerpc/vec-extract-9.c
gcc.target/powerpc/vec-extract-v16qi-df.c
gcc.target/powerpc/vec-extract-v16qi.c
gcc.target/powerpc/vec-extract-v16qiu-df.c
gcc.target/powerpc/vec-extract-v16qiu.c
gcc.target/powerpc/vec-extract-v2df.c
gcc.target/powerpc/vec-extract-v2di.c
gcc.target/powerpc/vec-extract-v4sf.c
gcc.target/powerpc/vec-extract-v4si-df.c
gcc.target/powerpc/vec-extract-v4si.c
gcc.target/powerpc/vec-extract-v4siu-df.c
gcc.target/powerpc/vec-extract-v4siu.c
gcc.target/powerpc/vec-extract-v8hi-df.c
gcc.target/powerpc/vec-extract-v8hi.c
gcc.target/powerpc/vec-extract-v8hiu-df.c
gcc.target/powerpc/vec-extract-v8hiu.c
gcc.target/powerpc/vsx-builtin-10b.c
gcc.target/powerpc/vsx-builtin-11b.c
gcc.target/powerpc/vsx-builtin-12b.c
gcc.target/powerpc/vsx-builtin-14b.c
gcc.target/powerpc/vsx-builtin-15b.c
gcc.target/powerpc/vsx-builtin-16b.c
gcc.target/powerpc/vsx-builtin-17b.c
gcc.target/powerpc/vsx-builtin-18b.c
gcc.target/powerpc/vsx-builtin-19b.c
gcc.target/powerpc/vsx-builtin-9b.c

On a little endian power8 system, the following tests now pass:

gcc.target/powerpc/pr87532-mc.c
gcc.target/powerpc/pr89765-mc.c
gcc.target/powerpc/vec-extract-v2di.c
gcc.target/powerpc/vsx-builtin-12b.c
gcc.target/powerpc/vsx-builtin-19b.c

2020-02-05  Michael Meissner  

PR target/93568
* config/rs6000/rs6000.c (get_vector_offset): Fix

--- /tmp/a8cqkr_rs6000.c2020-02-05 14:55:36.255021903 -0600
+++ gcc/config/rs6000/rs6000.c  2020-02-05 13:27:00.393877012 -0600
@@ -6744,8 +6744,7 @@ get_vector_offset (rtx mem, rtx element,
 
   /* All insns should use the 'Q' constraint (address is a single register) if
  the element number is not a constant.  */
-  rtx addr = XEXP (mem, 0);
-  gcc_assert (satisfies_constraint_Q (addr));
+  gcc_assert (satisfies_constraint_Q (mem));
 
   /* Mask the element to make sure the element number is between 0 and the
  maximum number of elements - 1 so that we don't generate an address

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH], PR target/93569, Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Michael Meissner

When I applied my previous patches for vec_extract, I switched to using
reg_to_non_prefixed to validate the vector extract address.  It uncovered a bug
that reg_to_non_prefixed allowed D-FORM (reg+offset) addresses to load up
Altivec registers on power7 and power8.  However, those systems only supported
X-FORM (reg+reg) addressing.  Power9 added support for DS-FORM and DQ-FORM
addressing to the Altivec registers.  This patch fixes this so that the
vsx-builtin-15d.c test case now passes.

Can I check this into the master branch?

I have done bootstrap builds and make check on both a little endian Power8
system and a big endian Power8 system.  There were no regressions.  On the big
endian system, just vsx-builtin-15d.c now passes.  On the little endian system,
vsx-builtin-15d.c now passes along with some Fortran tests.

2020-02-05  Michael Meissner  

PR target/93569
* config/rs6000/rs6000.c (reg_to_non_prefixed): Before ISA 3.0
we only had X-FORM (reg+reg) addressing in the traditional Altivec
registers.

--- /tmp/eAu61F_rs6000.c2020-02-05 18:08:48.698992017 -0500
+++ gcc/config/rs6000/rs6000.c  2020-02-05 17:23:55.733650185 -0500
@@ -24943,9 +24943,13 @@ reg_to_non_prefixed (rtx reg, machine_mo
 }
 
   /* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, IEEE
- 128-bit floating point, and 128-bit integers.  */
+ 128-bit floating point, and 128-bit integers.  Before power9, only indexed
+ addressing was available.  */
   else if (ALTIVEC_REGNO_P (r))
 {
+  if (!TARGET_P9_VECTOR)
+   return NON_PREFIXED_X;
+
   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
return NON_PREFIXED_DS;
 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH], PR target/93569, Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Michael Meissner

On Thu, Feb 06, 2020 at 09:49:18AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Feb 06, 2020 at 08:29:41AM -0500, Michael Meissner wrote:
> > --- /tmp/eAu61F_rs6000.c2020-02-05 18:08:48.698992017 -0500
> > +++ gcc/config/rs6000/rs6000.c  2020-02-05 17:23:55.733650185 -0500
> > @@ -24943,9 +24943,13 @@ reg_to_non_prefixed (rtx reg, machine_mo
> >  }
> >  
> >/* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, 
> > IEEE
> > - 128-bit floating point, and 128-bit integers.  */
> > + 128-bit floating point, and 128-bit integers.  Before power9, only 
> > indexed
> > + addressing was available.  */
> >else if (ALTIVEC_REGNO_P (r))
> >  {
> > +  if (!TARGET_P9_VECTOR)
> > +   return NON_PREFIXED_X;
> > +
> >if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
> > return NON_PREFIXED_DS;
> 
> That looks fine, but is this complete?  What about the other VSRs?  Like
> right before this:
> 
>   if (FP_REGNO_P (r))
> {
>   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
> return NON_PREFIXED_D;
> 
>   else if (size < 8)
> return NON_PREFIXED_X;
> 
>   else if (TARGET_VSX && size >= 16
>&& (VECTOR_MODE_P (mode)
>|| FLOAT128_VECTOR_P (mode)
>|| mode == TImode || mode == CTImode))
> return NON_PREFIXED_DQ;
> 
>   else
> return NON_PREFIXED_DEFAULT;
> }
> 
> If we are dealing with a SF or DF (or whatever else in a "legacy" FPR),
> that is fine, but what about vectors in those regs?  It says we can use
> DQ-mode here, but that is only true from p9 onward, no?

Good point.  I'll submit a revised patch once the bootstrap and make check 
finishes.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] PR target/93569 [version 2], Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Michael Meissner

This patch addresses the concern the Segher raised in the original submission
of the patch to fix PR target/93569.  In addition to checking for D*-form
addresses in the traditional Altivec registers, this patch also checks for
D*-form addresses for vectors in the traditional floating point registers.
Neither one of these address forms were allowed before ISA 3.0 (power9).

I have done bootstraps on both little and big endian Linux 64-bit systems, and
there were no regressions for this change.  Can I check this patch into the
master branch?

https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00387.html

2020-02-06  Michael Meissner  

PR target/93569
* config/rs6000/rs6000.c (reg_to_non_prefixed): Before ISA 3.0
we only had X-FORM (reg+reg) addressing for vectors.  Also before
ISA 3.0, we only had X-FORM addressing for scalars in the
traditional Altivec registers.

--- /tmp/VQDg8p_rs6000.c2020-02-06 11:55:27.509363545 -0500
+++ gcc/config/rs6000/rs6000.c  2020-02-06 11:54:28.461531334 -0500
@@ -24923,7 +24923,8 @@ reg_to_non_prefixed (rtx reg, machine_mo
   unsigned size = GET_MODE_SIZE (mode);
 
   /* FPR registers use D-mode for scalars, and DQ-mode for vectors, IEEE
- 128-bit floating point, and 128-bit integers.  */
+ 128-bit floating point, and 128-bit integers.  Before power9, only indexed
+ addressing was available for vectors.  */
   if (FP_REGNO_P (r))
 {
   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
@@ -24936,16 +24937,20 @@ reg_to_non_prefixed (rtx reg, machine_mo
   && (VECTOR_MODE_P (mode)
   || FLOAT128_VECTOR_P (mode)
   || mode == TImode || mode == CTImode))
-   return NON_PREFIXED_DQ;
+   return (TARGET_P9_VECTOR) ? NON_PREFIXED_DQ : NON_PREFIXED_X;
 
   else
return NON_PREFIXED_DEFAULT;
 }
 
   /* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, IEEE
- 128-bit floating point, and 128-bit integers.  */
+ 128-bit floating point, and 128-bit integers.  Before power9, only indexed
+ addressing was available.  */
   else if (ALTIVEC_REGNO_P (r))
 {
+  if (!TARGET_P9_VECTOR)
+   return NON_PREFIXED_X;
+
   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
return NON_PREFIXED_DS;
 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH], Rename and document PowerPC -mprefixed-addr to -mprefixed

2020-02-10 Thread Michael Meissner

This patch renames the PowerPC internal switch -mprefixed-addr to be
-mprefixed.

Last week, Bill, Segher, and I were talking, and we came to the conclusion that
we needed to make the prefixed addressing option more public.  This is
particularly true, when you consider that only 64-bit little endian Linux will
have support for these mode.  Other OSes, ABI's, etc. out there that may/may
not support all of the new addressing modes in the 'future' computer.  And we
also decided, we preferred the simpler '-mprefixed' option over
'-mprefixed-addr'.

If you use -mno-prefixed, you get the current addressing modes on your system
for power9 and the compiler will not generate the prefixed loads or stores.

If you use -mprefixed -mno-pcrel, the compiler will generate prefixed loads and
stores utilizing 34-bit offset addressing with numeric offsets that don't need
relocation.  It will not generate PC-relative loads and stores.

If you use -mpcrel, you must be using the 64-bit ELF v2 ABI, and the code model
must be medium.  If you use -mpcrel, the compiler will generate PC-relative
loads and stores to access items, rather than the current TOC based loads and
stores.

If you use -mpcrel, it implies -mprefixed.  If you use -mno-prefixed, you
cannot use -mpcrel.

With the exception of making the switch a public switch, and documenting it,
this patch is just a simple mechanical conversion, converting
TARGET_PREFIXED_ADDR to TARGET_PREFIXED, etc.

Because the -mprefixed-addr was just an internal and undocumented switch, I
have not provided for an alias between -mprefixed to -mprefixed-addr (though I
can do that if desired).

I have tested these patches on both little endian and big endian Linux 64-bit
systems, and there were no regressions.  Can I check these patches into the
master GCC branch for GCC 10?

2020-02-10  Michael Meissner  

* config/rs6000/predicates.md (cint34_operand): Rename the
-mprefixed-addr option to be -mprefixed.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Rename
the -mprefixed-addr option to be -mprefixed.
(OTHER_FUTURE_MASKS): Likewise.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Rename
the -mprefixed-addr option to be -mprefixed.  Change error
messages to refer to -mprefixed.
(num_insns_constant_gpr): Rename the -mprefixed-addr option to be
-mprefixed.
(rs6000_legitimate_offset_address_p): Likewise.
(rs6000_mode_dependent_address): Likewise.
(rs6000_opt_masks): Change the spelling of "-mprefixed-addr" to be
"-mprefixed" for target attributes and pragmas.
(address_to_insn_form): Rename the -mprefixed-addr option to be
-mprefixed.
(rs6000_adjust_insn_length): Likewise.
* config/rs6000/rs6000.h (FINAL_PRESCAN_INSN): Rename the
-mprefixed-addr option to be -mprefixed.
(ASM_OUTPUT_OPCODE): Likewise.
* config/rs6000/rs6000.md (prefixed insn attribute): Rename the
-mprefixed-addr option to be -mprefixed.
* config/rs6000/rs6000.opt (-mprefixed): Rename the
-mprefixed-addr option to be prefixed.  Change the option from
being undocumented to being documented.
* doc/invoke.texi (RS/6000 and PowerPC Options): Docment the
-mprefixed option.  Update the -mpcrel documentation to mention
-mprefixed.

--- /tmp/N41Ptv_predicates.md   2020-02-07 17:56:52.590487419 -0500
+++ gcc/config/rs6000/predicates.md 2020-02-07 17:34:02.891610645 -0500
@@ -306,7 +306,7 @@ (define_predicate "const_0_to_15_operand
 (define_predicate "cint34_operand"
   (match_code "const_int")
 {
-  if (!TARGET_PREFIXED_ADDR)
+  if (!TARGET_PREFIXED)
 return 0;
 
   return SIGNED_INTEGER_34BIT_P (INTVAL (op));
--- /tmp/aS8nV8_rs6000-cpus.def 2020-02-07 17:56:52.599487550 -0500
+++ gcc/config/rs6000/rs6000-cpus.def   2020-02-07 17:34:02.894610688 -0500
@@ -79,11 +79,11 @@
is fully functional.  */
 #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER   
\
 | OPTION_MASK_FUTURE   \
-| OPTION_MASK_PREFIXED_ADDR)
+| OPTION_MASK_PREFIXED)
 
 /* Flags that need to be turned off if -mno-future.  */
 #define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL  \
-| OPTION_MASK_PREFIXED_ADDR)
+| OPTION_MASK_PREFIXED)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
@@ -143,7 +143,7 @@
 | OPTION_MASK_POWERPC64\
 | OPTION_MASK_PPC_GFXOPT   \
 | OPTION_MASK_PP

Re: [PATCH], Rename and document PowerPC -mprefixed-addr to -mprefixed

2020-02-11 Thread Michael Meissner

On Mon, Feb 10, 2020 at 09:24:07PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote:
> > This patch renames the PowerPC internal switch -mprefixed-addr to be
> > -mprefixed.
> 
> > If you use -mpcrel, you must be using the 64-bit ELF v2 ABI, and the code 
> > model
> > must be medium.
> 
> Currently, anyway.
> 
> > If you use -mpcrel, the compiler will generate PC-relative
> > loads and stores to access items, rather than the current TOC based loads 
> > and
> > stores.
> 
> Where that is the best thing to do.  Is that always now?  :-)
> 
> > If you use -mpcrel, it implies -mprefixed.  If you use -mno-prefixed, you
> > cannot use -mpcrel.
> 
> -mno-prefixed should imply -mno-pcrel; does it?

Yes.  -mno-prefixed-addr also impied -mno-pcrel.

> > * doc/invoke.texi (RS/6000 and PowerPC Options): Docment the
> 
> (typo)

Thanks.

> > --- /tmp/1ySv8k_invoke.texi 2020-02-07 17:56:52.700489015 -0500
> > +++ gcc/doc/invoke.texi 2020-02-07 17:34:02.925611138 -0500
> > @@ -22327,7 +22328,6 @@ faster on processors with 32-bit busses
> >  aligns structures containing the above types differently than
> >  most published application binary interface specifications for the m68k.
> >  
> > -@item -mpcrel
> >  @opindex mpcrel
> >  Use the pc-relative addressing mode of the 68000 directly, instead of
> >  using a global offset table.  At present, this option implies 
> > @option{-fpic},
> 
> This isn't a correct change.

Yeah, evidently I put the PowerPC stuff in the m68 -mpcrel area.  I'll fix it.

> Okay for trunk modulo the m68k change.  Thanks!
> 
> 
> Segher

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test

2020-02-12 Thread Michael Meissner

On Wed, Feb 12, 2020 at 11:27:01PM +0100, Jakub Jelinek wrote:
> On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote:
> > This patch renames the PowerPC internal switch -mprefixed-addr to be
> > -mprefixed.
> 
> --- gcc/config/rs6000/rs6000.opt
> +++ gcc/config/rs6000/rs6000.opt
> @@ -570,8 +570,8 @@ mfuture
>  Target Report Mask(FUTURE) Var(rs6000_isa_flags)
>  Use instructions for a future architecture.
>  
> -mprefixed-addr
> -Target Undocumented Mask(PREFIXED_ADDR) Var(rs6000_isa_flags)
> +mprefixed
> +Target Report Mask(PREFIXED) Var(rs6000_isa_flags)
>  Generate (do not generate) prefixed memory instructions.
>  
>  mpcrel
> 
> This change broke the gcc.target/powerpc/pr93122.c test, so it now
> FAIL: gcc.target/powerpc/pr93122.c (test for excess errors)
> Excess errors:
> xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean 
> '-mprefixed'?
> 
> Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to
> trunk as obvious.

Thanks.  I don't think that test was in the trunk when I did the the bootstrap
for the -mprefixed-addr to -mprefixed option.  I was about to send a similar
patch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

PowerPC V10 Patches for -mcpu=future

2019-12-11 Thread Michael Meissner

This set of patches is an attempt to address the issues raised in the previous
sets of patches:

The V7 patches were for important functionality
The V8 patches were for tests
The V9 patches were for the PCREL_OPT support

As I write this there are 12 patches.  There will be more patches later to
address the remaining test suite patches.  I need to look at the comments for
PCREL_OPT in detail to see what the strategy should be for those patches.

Patches V10 #1-3 are the remaining issues from V7 #1-3 to add PADDI and PLI
support for large constants.  In theory once the reformating that was
previously done and checked in, these should be simple.

Patches V10 #4-7 break up patch V7 #6 (vector extract) into 4 separate patches.

Patch V10 #8 is patch V7 #7 (turn on -mpcrel by default on 64-bit Linux targets
for -mcpu=future), changing the names of the enabling macros.

Patch V10 #9 is patch V7 #5 that was redone.  This patch adds new effective
target options for PowerPC.  I have changed this patch to look at the code
generated by the compiler to see if prefixed adddressing or PC-relative
addressing is used for -mcpu=future.  This patch needs patch V10 #8 installed
to enable the prefixed addressing and PC-relative tests.

In patch V10 #9, I did not modify the existing test
(check_effective_target_powerpc_future_ok).  As we discussed, this test should
really test whether a non-prefixed instruction is generated to allow for
targets that might support -mcpu=future but not enable prefixed addressing.
However, at present the only instructions being submitted are prefixed
instructions.  So this will have to wait until we get further down the road
with 'future' instructions.

Patch V10 #10 is a modification of patch V8 #1.  I renamed the files from
paddi-?.c to prefixed-*.c so that there isn't a false match due to the .ident
directive.

Patch V10 #11 is a slight reworking of patch V8 #2 (testing whether we generate
a prefixed instruction when the offset would be invalid for DS and DQ
instruction formats).

Patch V10 #12 is a slight reworking of patch V8 #3 (making sure we don't try to
generate the non-existant PLWZU and PSTWU pre-modify instructions).

There are 3 other patches from V8 that I will address at a later date.  Patch
V8 #4 are the tests for using prefixed instructions for each of the types when
a large numeric offset is used.  Patch V8 #5 are the tests for using
PC-relative load/store instructions for each of the types to reference static
values.  Patch V8 #6 is the test to make sure the -fstack-protector support
works when the stack frame is large and -mcpu=future is used.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #1, Use PLI to load up large DImode constants if -mcpu=future

2019-12-11 Thread Michael Meissner

This patch adds an alternative to use PLI to load up large DImode constants if
-mcpu=future is used.

It is a slight reworking of patch V7 #1 after reformating the movdi_interal64
insn.  I have done bootstraps and make check on a power8 little endian system
and there were no regressions.  Can I check this patch in?

Patch V7 #1:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01301.html

2019-12-09  Michael Meissner  

* config/rs6000/rs6000.c (num_insns_constant_gpr): Return 1 if the
constant can be loaded with PLI if -mcpu=future.
* config/rs6000/rs6000.md (movdi_internal64): Add alternative to
use PLI to load up 34-bit constants if -mcpu=future.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279141)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5541,6 +5541,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va
   && (value >> 31 == -1 || value >> 31 == 0))
 return 1;
 
+  /* PADDI can support up to 34 bit signed integers.  */
+  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
+return 1;
+
   else if (TARGET_POWERPC64)
 {
   HOST_WIDE_INT low  = ((value & 0x) ^ 0x8000) - 0x8000;
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 279141)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -8828,7 +8828,7 @@ (define_split
 })
 
 ;;GPR store   GPR loadGPR move
-;;GPR li  GPR lis GPR #
+;;GPR li  GPR lis GPR pli GPR #
 ;;FPR store   FPR loadFPR move
 ;;AVX store   AVX store   AVX loadAVX loadVSX move
 ;;P9 0P9 -1   AVX 0/-1VSX 0   VSX -1
@@ -8838,7 +8838,7 @@ (define_split
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
  "=YZ,r,  r,
-  r,  r,  r,
+  r,  r,  r,  r,
   m,  ^d, ^d,
   wY, Z,  $v, $v, ^wa,
   wa, wa, v,  wa, wa,
@@ -8847,7 +8847,7 @@ (define_insn "*movdi_internal64"
   ?r, ?wa")
(match_operand:DI 1 "input_operand"
  "r,  YZ, r,
-  I,  L,  nF,
+  I,  L,  eI, nF,
   ^d, m,  ^d,
   ^v, $v, wY, Z,  ^wa,
   Oj, wM, OjwM,   Oj, wM,
@@ -8863,6 +8863,7 @@ (define_insn "*movdi_internal64"
mr %0,%1
li %0,%1
lis %0,%v1
+   li %0,%1
#
stfd%U0%X0 %1,%0
lfd%U1%X1 %0,%1
@@ -8886,7 +8887,7 @@ (define_insn "*movdi_internal64"
mtvsrd %x0,%1"
   [(set_attr "type"
  "store,  load,   *,
-  *,  *,  *,
+  *,  *,  *,  *,
   fpstore,fpload, fpsimple,
   fpstore,fpstore,fpload, fpload, veclogical,
   vecsimple,  vecsimple,  vecsimple,  veclogical, veclogical,
@@ -8896,7 +8897,7 @@ (define_insn "*movdi_internal64"
(set_attr "size" "64")
(set_attr "length"
  "*,  *,  *,
-  *,  *,  20,
+  *,  *,  *,  20,
   *,  *,  *,
   *,  *,  *,  *,  *,
   *,  *,  *,  *,  *,
@@ -8905,7 +8906,7 @@ (define_insn "*movdi_internal64"
   *,  *")
(set_attr "isa"
  "*,  *,  *,
-  *,  *,  *,
+  *,      *,  fut,*,
   *,  *,  *,
   p9v,p7v,p9v,p7v,*,
   p9v,p9v,p7v,*,  *,

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #2, use PLI to load up large SImode constants if -mcpu=future

2019-12-11 Thread Michael Meissner

This patch adds an alternative to use PLI to load up large SImode constants if
-mcpu=future is used.

It is a slight reworking of patch V7 #2 after reformating the movsi_interal1
insn.  I have done bootstraps and make check on a power8 little endian system
and there were no regressions.  Can I check this patch in once patch V10 #1 is
checked in?

Patch V7 #2:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01302.html

2019-12-09  Michael Meissner  

* config/rs6000/rs6000.md (movsi_internal1): Add alternative to
use PLI to load up 34-bit constants if -mcpu=future.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 279143)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -6892,7 +6892,7 @@ (define_split
 ;;MR  LA
 ;;LWZ LFIWZX  LXSIWZX
 ;;STW STFIWX  STXSIWX
-;;LI  LIS #
+;;LI  LIS PLI #
 ;;XXLOR   XXSPLTIB 0  XXSPLTIB -1 VSPLTISW
 ;;XXLXOR 0XXLORC -1   P9 const
 ;;MTVSRWZ MFVSRWZ
@@ -6903,7 +6903,7 @@ (define_insn "*movsi_internal1"
  "=r, r,
   r,  d,  v,
   m,  Z,  Z,
-  r,  r,  r,
+  r,  r,  r,  r,
   wa, wa, wa, v,
   wa, v,  v,
   wa, r,
@@ -6912,7 +6912,7 @@ (define_insn "*movsi_internal1"
  "r,  U,
   m,  Z,  Z,
   r,  d,  v,
-  I,  L,  n,
+  I,  L,  eI, n,
   wa, O,  wM, wB,
   O,  wM, wS,
   r,  wa,
@@ -6930,6 +6930,7 @@ (define_insn "*movsi_internal1"
stxsiwx %x1,%y0
li %0,%1
lis %0,%v1
+   li %0,%1
#
xxlor %x0,%x1,%x1
xxspltib %x0,0
@@ -6947,7 +6948,7 @@ (define_insn "*movsi_internal1"
  "*,  *,
   load,   fpload, fpload,
   store,  fpstore,fpstore,
-  *,  *,  *,
+  *,  *,  *,  *,
   veclogical, vecsimple,  vecsimple,  vecsimple,
   veclogical, veclogical, vecsimple,
   mffgpr, mftgpr,
@@ -6956,7 +6957,7 @@ (define_insn "*movsi_internal1"
  "*,  *,
   *,  *,  *,
   *,  *,  *,
-  *,  *,  8,
+  *,  *,  *,  8,
   *,  *,  *,  *,
   *,  *,  8,
   *,  *,
@@ -6965,7 +6966,7 @@ (define_insn "*movsi_internal1"
  "*,  *,
   *,  p8v,p8v,
   *,  p8v,p8v,
-  *,  *,  *,
+  *,  *,  fut,*,
   p8v,p9v,p9v,p8v,
   p9v,p8v,p9v,
   p8v,p8v,
@@ -7120,8 +7121,7 @@ (define_insn "*movsi_from_df"
 (define_split
   [(set (match_operand:SI 0 "gpc_reg_operand")
(match_operand:SI 1 "const_int_operand"))]
-  "(unsigned HOST_WIDE_INT) (INTVAL (operands[1]) + 0x8000) >= 0x1
-   && (INTVAL (operands[1]) & 0xffff) != 0"
+  "num_insns_constant (operands[1], SImode) > 1"
   [(set (match_dup 0)
(match_dup 2))
(set (match_dup 0)

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #3, Use PADDI to add large constants if -mcpu=future is used

2019-12-11 Thread Michael Meissner

This patch adds an alternative to use PADDI to add large SImode and DImode
constants if -mcpu=future is used.

It is a slight reworking of patch V7 #3.  I have done bootstraps and make check
on a power8 little endian system and there were no regressions.  Can I check
this patch in?

2019-12-09  Michael Meissner  

* config/rs6000/predicates.md (add_operand): Allow eI constants.
* config/rs6000/rs6000.md (add3): Add alternative to
generate PADDI for 34-bit constants if -mcpu=future.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 279141)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre
 (define_predicate "add_operand"
   (if_then_else (match_code "const_int")
 (match_test "satisfies_constraint_I (op)
-|| satisfies_constraint_L (op)")
+|| satisfies_constraint_L (op)
+|| satisfies_constraint_eI (op)")
 (match_operand 0 "gpc_reg_operand")))
 
 ;; Return 1 if the operand is either a non-special register, or 0, or -1.
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 279144)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -1761,15 +1761,17 @@ (define_expand "add3"
 })
 
 (define_insn "*add3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
-   (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b")
- (match_operand:GPR 2 "add_operand" "r,I,L")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
+   (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
+ (match_operand:GPR 2 "add_operand" "r,I,L,eI")))]
   ""
   "@
add %0,%1,%2
addi %0,%1,%2
-   addis %0,%1,%v2"
-  [(set_attr "type" "add")])
+   addis %0,%1,%v2
+   addi %0,%1,%2"
+  [(set_attr "type" "add")
+   (set_attr "isa" "*,*,*,fut")])
 
 (define_insn "*addsi3_high"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=b")

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints

2019-12-11 Thread Michael Meissner

Add new constraints to match whether a memory is not prefixed (em constraint)
or prefixed (ep constraint).  This is one of 4 parts aimed at reworking the
vector extract code in patch V7 #6.

This patch just adds the new constraints, but these constraints will not be
used until the next patch.  Originally I had just one constraint (em) that
matched non-prefixed memory operands.  But in order to use it, I needed to make
sure the combiner did not combine vector extracts with a variable offset with a
PC-relative memory location.

I.e.:

#include 

static vector double vd;

double get (unsigned int n)
{
  return vec_extract (vd, n);
}

In addition, as I contemplate the bigger issue about the insn length attribute,
I suspect we may need to have an ep attribute as well as em.

Patch V7 #6:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01306.html

I have bootstrapped the compiler on a little endian power8 system and ran make
check and there were no regressions.  Can I check this patch in?

2019-12-10  Michael Meissner  

* config/rs6000/constraints.md (em constraint): New constraint for
non-prefixed memory operands.
(ep constraint): New constraint for prefixed memory operands.
* config/rs6000/predicates.md (non_prefixed_memory): New predicate
for non-prefixed memory operands.
* doc/md.texi (PowerPC constraints): Document em and ep constraints.

Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 279182)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -202,6 +202,16 @@ (define_constraint "H"
 
 ;; Memory constraints
 
+(define_memory_constraint "em"
+  "A memory operand that does not contain a prefixed address."
+  (and (match_code "mem")
+   (match_operand 0 "non_prefixed_memory")))
+
+(define_memory_constraint "ep"
+  "A memory operand that does contains a prefixed address."
+  (and (match_code "mem")
+   (match_operand 0 "prefixed_memory")))
+
 (define_memory_constraint "es"
   "A ``stable'' memory operand; that is, one which does not include any
 automodification of the base register.  Unlike @samp{m}, this constraint
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 279151)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1846,3 +1846,17 @@ (define_predicate "prefixed_memory"
 {
   return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
 })
+
+;; Return true if the operand is a valid memory address that does not use a
+;; prefixed address.
+(define_predicate "non_prefixed_memory"
+  (match_code "mem")
+{
+  enum insn_form iform
+= address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
+
+  return (iform != INSN_FORM_BAD
+  && iform != INSN_FORM_PREFIXED_NUMERIC
+ && iform != INSN_FORM_PCREL_LOCAL
+ && iform != INSN_FORM_PCREL_EXTERNAL);
+})
Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi (revision 279182)
+++ gcc/doc/md.texi (working copy)
@@ -3373,6 +3373,12 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va
 
 is not.
 
+@item em
+A memory operand that does not contain a prefixed address.
+
+@item ep
+A memory operand that does contains a prefixed address.
+
 @item es
 A ``stable'' memory operand; that is, one which does not include any
 automodification of the base register.  This used to be useful when

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #5, Fix codegen bug with vector extracts using a variable offset & PC-relative address

2019-12-11 Thread Michael Meissner

This patch fixes a bug with vector extracts using a PC-relative address and a
variable offset with using -mcpu=future.

Consider the code:

#include 

static vector double vd;
vector double *p = &vd;

double get (unsigned int n)
{
  return vec_extract (vd, n);
}

If you compile this code with -O2 -mcpu=future -mpcrel you get:

get:
pla 9,.LANCHOR0@pcrel
lfdx 1,9,9
blr

This is because there is only one base register temporary, and the current code
tries to first create the offset and then use the same temporary to hold the
address of the PC-relative value.

After combine the insn is:

(insn 14 9 15 2 (parallel [
(set (reg/i:DF 33 1)
(unspec:DF [
(mem/c:V2DF (symbol_ref:DI ("*.LANCHOR0") [flags 
0x182]) [1 vd+0 S16 A128])
(reg:DI 123 [ n ])
] UNSPEC_VSX_EXTRACT))
(clobber (scratch:DI))
(clobber (scratch:V2DI))
]) "foo.c":9:1 1314 {vsx_extract_v2df_var}


Split2 changes this to:

(insn 20 8 21 2 (set (reg:DI 3 3 [orig:123 n ] [123])
(and:DI (reg:DI 3 3 [orig:123 n ] [123])
(const_int 1 [0x1]))) "foo.c":9:1 193 {anddi3_mask}
 (nil))
(insn 21 20 22 2 (set (reg:DI 9 9 [126])
(ashift:DI (reg:DI 3 3 [orig:123 n ] [123])
(const_int 3 [0x3]))) "foo.c":9:1 256 {ashldi3}
 (nil))
(insn 22 21 23 2 (set (reg:DI 9 9 [126])
(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) "foo.c":9:1 680 
{*pcrel_local_addr}
 (nil))
(insn 23 22 15 2 (set (reg/i:DF 33 1)
(mem/c:DF (plus:DI (reg:DI 9 9 [126])
(reg:DI 9 9 [126])) [1  S8 A8])) "foo.c":9:1 512 
{*movdf_hardfloat64}
 (nil))

I.e. setting GPR r9 first to the offset << 3, and then wiping out the offset
and setting in the address of the PC-relative structure.

This patch changes all of the variable extract insns and the function in
rs6000.c that processes them to have a second base register temporary only if
we have prefixed addresses.  The code generated then becomes:

get:
extsw 3,3
pla 10,.LANCHOR0@pcrel
rldicl 3,3,0,63
sldi 9,3,3
lfdx 1,10,9

I use the em and ep constraints to keep the alternatives separate.  Using em
prevents the register allocator from skipping the alternative with ep in it
because it has an extra scratch register.

I have bootstrapped the compiler on a little endian power8 system and ran make
check without regression.  Can I check this in once patch V10 #4 is checked in?

2019-12-10  Michael Meissner  

* config/rs6000/rs6000-protos.h (rs6000_split_vec_extract_var):
Update calling signature.
* config/rs6000/rs6000.c (rs6000_split_vec_extract_var): Add
additional tmp base register argument.  If the memory is prefixed,
put the address into the new tmp base register.
* config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator):
Add new temporary for loading up the address of prefixed memory
operands.
(vsx_extract_v4sf_var): Add new temporary for loading up the
address of prefixed memory operands.
(vsx_extract__var, VSX_EXTRACT_I iterator): Add new
temporary for loading up the address of prefixed memory operands.
(vsx_extract__mode_var): Add new temporary for
loading up the address of prefixed memory operands.

Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 279182)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -59,7 +59,7 @@ extern void rs6000_expand_float128_conve
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
-extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
+extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279182)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6861,7 +6861,7 @@ rs6000_adjust_vec_address (rtx scalar_re
 
 void
 rs6000_split_vec_extract_var (rtx dest, rtx src, rtx element, rtx tmp_gpr,
- rtx tmp_altivec)
+ rtx tmp_altivec, rtx tmp_prefixed)
 {
   machine_mode mode = GET_MODE (src);
   machine_mode scalar_mode = GET_MODE_INNER (GET_MODE (src));
@@ -6878,6 +6878,16 @@ rs6000_split_vec_ext

[PATCH] V10 patch #6, Use prefixed load/stores for vector extract with large offsets

2019-12-11 Thread Michael Meissner

This patch optimizes vector extracts where the vector is pointed to by an
address with an offset larger than 16-bits to fold the add into the final
address.

I.e.

#include 

double get (vector double *p, unsigned int h)
{
  return vec_extract (p[5], 1);
}

I have bootstraped this patch on a little endian power8 system and ran make
check with no regressions.  Can I check this patch in?

2019-12-10  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
for the offset being 34-bits when -mcpu=future is used.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279199)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6766,9 +6766,17 @@ rs6000_adjust_vec_address (rtx scalar_re
  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
  rtx offset_rtx = GEN_INT (offset);
 
- if (IN_RANGE (offset, -32768, 32767)
+ /* 16-bit offset.  */
+ if (SIGNED_16BIT_OFFSET_P (offset)
  && (scalar_size < 8 || (offset & 0x3) == 0))
new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+ /* 34-bit offset if we have prefixed addresses.  */
+ else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset))
+   new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+ /* Offset overflowed, move offset to the temporary (which will likely
+be split), and do X-FORM addressing.  */
  else
{
  emit_move_insn (base_tmp, offset_rtx);
@@ -6799,6 +6807,12 @@ rs6000_adjust_vec_address (rtx scalar_re
  emit_insn (insn);
}
 
+ /* Make sure we don't overwrite the temporary if the element being
+extracted is variable, and we've put the offset into base_tmp
+previously.  */
+ else if (rtx_equal_p (base_tmp, element_offset))
+   emit_insn (gen_add2_insn (base_tmp, op1));
+
  else
{
  emit_move_insn (base_tmp, op1);

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #7, Improve vector_extract code of a PC-relative address with a constant offset for -mcpu=future

2019-12-11 Thread Michael Meissner

This patch improves the code of vector_extract when the vector is addressed
with a PC-relative address, and the element number is constant.

I.e.

#include 

static vector double vd[10];
vector double *p = &vd[0];

double get (void)
{
  return vector_extract (vd[4], 1);
}

I have bootstrapped this code on a little endian power8 and ran make check and
there were no regressions.  Can I check this into the trunk?

2019-12-10  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper
function.
(rs6000_adjust_vec_address): Add support for folding a constant
offset of a vector extract of a vector accessed with PC-relative
addressing into the offset of the load.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279200)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6698,6 +6698,30 @@ rs6000_expand_vector_extract (rtx target
 }
 }
 
+/* Helper function to return an address mask based on a physical register.  */
+
+static addr_mask_type
+rs6000_reg_to_addr_mask (rtx reg, machine_mode mode)
+{
+  unsigned int r = reg_or_subregno (reg);
+  addr_mask_type addr_mask;
+
+  gcc_assert (HARD_REGISTER_NUM_P (r));
+  if (INT_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR];
+
+  else if (FP_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR];
+
+  else if (ALTIVEC_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX];
+
+  else
+gcc_unreachable ();
+
+  return addr_mask;
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -6823,8 +6847,57 @@ rs6000_adjust_vec_address (rtx scalar_re
}
 }
 
+  /* For references to local static variables, try to fold a constant offset
+ into the address.  */
+  else if (pcrel_local_address (addr, Pmode) && CONST_INT_P (element_offset))
+{
+  if (GET_CODE (addr) == CONST)
+   addr = XEXP (addr, 0);
+
+  if (GET_CODE (addr) == PLUS)
+   {
+ rtx op0 = XEXP (addr, 0);
+ rtx op1 = XEXP (addr, 1);
+ if (CONST_INT_P (op1))
+   {
+ HOST_WIDE_INT offset
+   = INTVAL (XEXP (addr, 1)) + INTVAL (element_offset);
+
+ if (offset == 0)
+   new_addr = op0;
+
+ else if (SIGNED_34BIT_OFFSET_P (offset))
+   {
+ rtx plus = gen_rtx_PLUS (Pmode, op0, GEN_INT (offset));
+ new_addr = gen_rtx_CONST (Pmode, plus);
+   }
+
+ else
+   {
+ emit_move_insn (base_tmp, addr);
+ new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
+   }
+   }
+ else
+   {
+ emit_move_insn (base_tmp, addr);
+ new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
+   }
+   }
+
+  else
+   {
+ rtx plus = gen_rtx_PLUS (Pmode, addr, element_offset);
+ new_addr = gen_rtx_CONST (Pmode, plus);
+   }
+}
+
   else
 {
+  /* Make sure we don't overwrite the temporary if the vector extract
+offset was variable.  */
+  gcc_assert (!rtx_equal_p (base_tmp, element_offset));
+
   emit_move_insn (base_tmp, addr);
   new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
 }
@@ -6834,21 +6907,8 @@ rs6000_adjust_vec_address (rtx scalar_re
   if (GET_CODE (new_addr) == PLUS)
 {
   rtx op1 = XEXP (new_addr, 1);
-  addr_mask_type addr_mask;
-  unsigned int scalar_regno = reg_or_subregno (scalar_reg);
-
-  gcc_assert (HARD_REGISTER_NUM_P (scalar_regno));
-  if (INT_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_GPR];
-
-  else if (FP_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_FPR];
-
-  else if (ALTIVEC_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_VMX];
-
-  else
-   gcc_unreachable ();
+  addr_mask_type addr_mask
+   = rs6000_reg_to_addr_mask (scalar_reg, scalar_mode);
 
   if (REG_P (op1) || SUBREG_P (op1))
valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
@@ -6856,9 +6916,21 @@ rs6000_adjust_vec_address (rtx scalar_re
valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
 }
 
+  /* An address that is a single register is always valid for either indexed or
+ offsettable loads.  */
   else if (REG_P (new_addr) || SUBREG_P (new_addr))
 valid_addr_p = true;
 
+  /* If we have a PC-relative address, check if offsetable loads are
+ allowed.  */
+  else if (pcrel_local

[PATCH] V10 patch #8, Enable -mpcrel and -mprefixed-addr for -mcpu=future on 64-bit little endian Linux systems

2019-12-11 Thread Michael Meissner

This patch enables -mpcrel and -mprefixed-addr when -mcpu=future is used on a
64-bit little endian Linux system, but it does not enable those options on
other systems.  It is a slight reworking of patch V7 #7 taking into account the
comments you made.

In particular, I changed the macros used by the target tm.h file to be:
PREFIXED_ADDR_SUPPORTED_BY_OS
PCREL_SUPPORTED_BY_OS

Patch V7 #7:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01307.html

I have bootstrapped the compiler on a little endian power8 system, and ran make
check with no regressions.  I also tested the code by not turning on -mpcrel or
-mprefixed-addr for Linux 64-bit little endian and inspected the code and saw
the appropriate code was generated.

In terms of your comment:

| ... and I don't understand this code.  If you use -mpcrel but you do not
| have the medium model, you _do_ get prefixed but you do _not_ get pcrel?
| And this all quietly?

You do not get this quietly.  You will get an error if you use -mpcrel and
-mcmodel=large options together.

2019-12-10  Michael Meissner  

* config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to
1 to enable prefixed addressing if -mcpu=future.
(PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing
if -mcpu=future.
* config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not
enable -mprefixed-addr or -mpcrel by default.
(ADDRESSING_FUTURE_MASKS): New macro.
(OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS.
* config/rs6000/rs6000.c (PREFIXED_ADDR_SUPPORTED_BY_OS): Disable
prefixed addressing unless the target OS tm.h says we should
enable it.
(PCREL_SUPPORTED_BY_OS): Disable PC-relative addressing unless the
target OS tm.h says we should enable it.
(rs6000_debug_reg_global): Print whether prefixed addressing and
PC-relative addressing is enabled by default if -mcpu=future.
(rs6000_option_override_internal): Move setting prefixed
addressing and PC-relative addressing after the sub-target option
handling is done.  Only enable prefixed addressing or PC-relative
address on -mcpu=future system if the target OS says to enable
it.  Disallow prefixed addressing on 32-bit systems or if the
target object file is not ELF v2.

Index: gcc/config/rs6000/linux64.h
===
--- gcc/config/rs6000/linux64.h (revision 279141)
+++ gcc/config/rs6000/linux64.h (working copy)
@@ -640,3 +640,11 @@ extern int dot_symbols;
enabling the __float128 keyword.  */
 #undef TARGET_FLOAT128_ENABLE_TYPE
 #define TARGET_FLOAT128_ENABLE_TYPE 1
+
+/* Enable support for pc-relative and numeric prefixed addressing on the
+   'future' system.  */
+#undef  PREFIXED_ADDR_SUPPORTED_BY_OS
+#define PREFIXED_ADDR_SUPPORTED_BY_OS  1
+
+#undef  PCREL_SUPPORTED_BY_OS
+#define PCREL_SUPPORTED_BY_OS  1
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 279141)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -75,15 +75,22 @@
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_P9_VECTOR)
 
-/* Support for a future processor's features.  Do not enable -mpcrel until it
-   is fully functional.  */
+/* Support for a future processor's features.  The prefixed and pc-relative
+   addressing bits are not added here.  Instead, they are added if the target
+   OS tm.h says that it supports the addressing modes by default when
+   -mcpu=future is used.  */
 #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER   
\
-| OPTION_MASK_FUTURE   \
+| OPTION_MASK_FUTURE)
+
+/* Addressing related flags on a future processor.  These are options that need
+   to be cleared if the target OS is not capable of supporting prefixed
+   addressing at all (such as 32-bit mode or if the object file format is not
+   ELF v2).  */
+#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL  
\
 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL  \
-| OPTION_MASK_PREFIXED_ADDR)
+#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279202)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -98,6 +98,16 @@
 #endif
 #endif
 
+/

[PATCH] V10 patch #9, Add new effective targets for the testsuite

2019-12-11 Thread Michael Meissner

Patch V10 #9 is patch V7 #5 that was redone.  This patch adds new effective
target options for PowerPC.  I have changed this patch to look at the code
generated by the compiler to see if prefixed adddressing or PC-relative
addressing is used for -mcpu=future.  This patch needs patch V10 #8 installed
to enable the prefixed addressing and PC-relative tests.

In patch V10 #9, I did not modify the existing test
(check_effective_target_powerpc_future_ok).  As we discussed, this test should
really test whether a non-prefixed instruction is generated to allow for
targets that might support -mcpu=future but not enable prefixed addressing.
However, at present the only instructions being submitted are prefixed
instructions.  So this will have to wait until we get further down the road
with 'future' instructions.

I have bootstrapped a little endian power8 compiler and ran make check with no
regressions.  In addition with this patch installed, the new tests now run as
expected with these changes.  Can I check this in (this needs patch V10 #8 to
be installed to enable the tests).

2019-12-11  Michael Meissner  

* lib/target-supports.exp (check_effective_target_powerpc_pcrel):
New target for PowerPC -mcpu=future support.
(check_effective_target_powerpc_prefixed_addr): New target for
PowerPC -mcpu=future support.

Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 279141)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -2161,6 +2161,23 @@ proc check_p9modulo_hw_available { } {
 }]
 }
 
+# Return 1 if the target generates PC-relative instructions automatically
+proc check_effective_target_powerpc_pcrel { } {
+return [check_no_messages_and_pattern powerpc_pcrel \
+   {\mpld\M.*[@]pcrel} assembly {
+   static long s;
+   long *p = &s;
+   long foo (void) { return s; }
+   } {-O2 -mcpu=future}]
+}
+
+# Return 1 if the target generates prefixed instructions automatically
+proc check_effective_target_powerpc_prefixed_addr { } {
+return [check_no_messages_and_pattern powerpc_prefixed_addr \
+   {\mpld\M} assembly {
+   long foo (long *p) { return p[0x12345]; }
+   } {-O2 -mcpu=future}]
+}
 
 # Return 1 if the target supports executing FUTURE instructions, 0 otherwise.
 # Cache the result.  It is assumed that if a simulator does not support the

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #10, Add PADDI/PLI tests for -mcpu=future

2019-12-11 Thread Michael Meissner

Patch V10 #10 is a modification of patch V8 #1.  I renamed the files from
paddi-?.c to prefixed-*.c so that there isn't a false match due to the .ident
directive.

This test passes when I do a make check.  One patch V10 #9 is checked in can I
commit this patch?

2019-12-11  Michael Meissner  

* gcc.target/powerpc/prefix-add.c: New test for -mcpu=future
generating PADDI for large constant adds.
* gcc.target/powerpc/prefix-di-constant.c: New test for
-mcpu=future generating PLI to load up large DImode constants.
* gcc.target/powerpc/prefix-si-constant.c: New test for
-mcpu=future generating PLI to load up large SImode constants.

Index: gcc/testsuite/gcc.target/powerpc/prefix-add.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-add.c   (revision 279252)
+++ gcc/testsuite/gcc.target/powerpc/prefix-add.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PADDI is generated to add a large constant.  */
+unsigned long
+add (unsigned long a)
+{
+  return a + 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpaddi\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c   (revision 
279252)
+++ gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant.  */
+unsigned long
+large (void)
+{
+  return 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c   (revision 
279252)
+++ gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant for SImode.  */
+void
+large_si (unsigned int *p)
+{
+  *p = 0x12345U;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V10 patch #11, Add test for generating prefixed load/store when the offset is not valid for DS/DQ instructions

2019-12-11 Thread Michael Meissner

Patch V10 #11 is a slight reworking of patch V8 #2 (testing whether we generate
a prefixed instruction when the offset would be invalid for DS and DQ
instruction formats).

This test passes when I run make check.  Can I check this in when patch V10 #9
is checked in?

2019-12-11  Michael Meissner  

* gcc.target/powerpc/prefix-ds-dq.c: New test to verify that we
generate the prefix load/store instructions for traditional
instructions with an offset that doesn't match DS/DQ
requirements.

Index: gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (revision 279256)
+++ gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (working copy)
@@ -0,0 +1,156 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests whether we generate a prefixed load/store operation for addresses that
+   don't meet DS/DQ offset constraints.  */
+
+unsigned long
+load_uc_offset1 (unsigned char *p)
+{
+  return p[1]; /* should generate LBZ.  */
+}
+
+long
+load_sc_offset1 (signed char *p)
+{
+  return p[1]; /* should generate LBZ + EXTSB.  */
+}
+
+unsigned long
+load_us_offset1 (unsigned char *p)
+{
+  return *(unsigned short *)(p + 1);   /* should generate LHZ.  */
+}
+
+long
+load_ss_offset1 (unsigned char *p)
+{
+  return *(short *)(p + 1);/* should generate LHA.  */
+}
+
+unsigned long
+load_ui_offset1 (unsigned char *p)
+{
+  return *(unsigned int *)(p + 1); /* should generate LWZ.  */
+}
+
+long
+load_si_offset1 (unsigned char *p)
+{
+  return *(int *)(p + 1);  /* should generate PLWA.  */
+}
+
+unsigned long
+load_ul_offset1 (unsigned char *p)
+{
+  return *(unsigned long *)(p + 1);/* should generate PLD.  */
+}
+
+long
+load_sl_offset1 (unsigned char *p)
+{
+  return *(long *)(p + 1); /* should generate PLD.  */
+}
+
+float
+load_float_offset1 (unsigned char *p)
+{
+  return *(float *)(p + 1);/* should generate LFS.  */
+}
+
+double
+load_double_offset1 (unsigned char *p)
+{
+  return *(double *)(p + 1);   /* should generate LFD.  */
+}
+
+__float128
+load_float128_offset1 (unsigned char *p)
+{
+  return *(__float128 *)(p + 1);   /* should generate PLXV.  */
+}
+
+void
+store_uc_offset1 (unsigned char uc, unsigned char *p)
+{
+  p[1] = uc;   /* should generate STB.  */
+}
+
+void
+store_sc_offset1 (signed char sc, signed char *p)
+{
+  p[1] = sc;   /* should generate STB.  */
+}
+
+void
+store_us_offset1 (unsigned short us, unsigned char *p)
+{
+  *(unsigned short *)(p + 1) = us; /* should generate STH.  */
+}
+
+void
+store_ss_offset1 (signed short ss, unsigned char *p)
+{
+  *(signed short *)(p + 1) = ss;   /* should generate STH.  */
+}
+
+void
+store_ui_offset1 (unsigned int ui, unsigned char *p)
+{
+  *(unsigned int *)(p + 1) = ui;   /* should generate STW.  */
+}
+
+void
+store_si_offset1 (signed int si, unsigned char *p)
+{
+  *(signed int *)(p + 1) = si; /* should generate STW.  */
+}
+
+void
+store_ul_offset1 (unsigned long ul, unsigned char *p)
+{
+  *(unsigned long *)(p + 1) = ul;  /* should generate PSTD.  */
+}
+
+void
+store_sl_offset1 (signed long sl, unsigned char *p)
+{
+  *(signed long *)(p + 1) = sl;/* should generate PSTD.  */
+}
+
+void
+store_float_offset1 (float f, unsigned char *p)
+{
+  *(float *)(p + 1) = f;   /* should generate STF.  */
+}
+
+void
+store_double_offset1 (double d, unsigned char *p)
+{
+  *(double *)(p + 1) = d;  /* should generate STD.  */
+}
+
+void
+store_float128_offset1 (__float128 f128, unsigned char *p)
+{
+  *(__float128 *)(p + 1) = f128;   /* should generate PSTXV.  */
+}
+
+/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlbz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mlfd\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlfs\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlha\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlhz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlwz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mplwa\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mplxv\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstb\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstfd\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mstfs\M}  1 } } */
+/* { dg-final { scan-assembler-times {\msth\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstw\M}   2 } } */

-- 
Michael Meissner, IBM
IBM,

[PATCH] V10 patch #12, Test to make sure we don't generate prefixed pre-modify load/stores for -mcpu=future

2019-12-11 Thread Michael Meissner

Patch V10 #12 is a slight reworking of patch V8 #3 (making sure we don't try to
generate the non-existant PLWZU and PSTWU pre-modify instructions).

This test passes when I run make check.  Can I check this in when patch V10 #9
is installed?

2019-12-11  Michael Meissner  

* gcc.target/powerpc/prefix-no-premodify.c: Make sure we do not
generate the non-existent PLWZU instruction if -mcpu=future.

Index: gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c  (revision 
279259)
+++ gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c  (working copy)
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Make sure that we don't generate a prefixed form of the load and store with
+   update instructions (i.e. instead of generating LWZU we have to generate
+   PLWZ plus a PADDI).  */
+
+#ifndef SIZE
+#define SIZE 5
+#endif
+
+struct foo {
+  unsigned int field;
+  char pad[SIZE];
+};
+
+struct foo *inc_load (struct foo *p, unsigned int *q)
+{
+  *q = (++p)->field;   /* PLWZ, PADDI, STW.  */
+  return p;
+}
+
+struct foo *dec_load (struct foo *p, unsigned int *q)
+{
+  *q = (--p)->field;   /* PLWZ, PADDI, STW.  */
+  return p;
+}
+
+struct foo *inc_store (struct foo *p, unsigned int *q)
+{
+  (++p)->field = *q;   /* LWZ, PADDI, PSTW.  */
+  return p;
+}
+
+struct foo *dec_store (struct foo *p, unsigned int *q)
+{
+  (--p)->field = *q;   /* LWZ, PADDI, PSTW.  */
+  return p;
+}
+
+/* { dg-final { scan-assembler-times {\mlwz\M}2 } } */
+/* { dg-final { scan-assembler-times {\mstw\M}2 } } */
+/* { dg-final { scan-assembler-times {\mpaddi\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mplwz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}   2 } } */
+/* { dg-final { scan-assembler-not   {\mplwzu\M}} } */
+/* { dg-final { scan-assembler-not   {\mpstwu\M}} } */
+/* { dg-final { scan-assembler-not   {\maddis\M}} } */
+/* { dg-final { scan-assembler-not   {\maddi\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints

2019-12-17 Thread Michael Meissner

On Tue, Dec 17, 2019 at 11:15:29AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 11, 2019 at 07:29:05PM -0500, Michael Meissner wrote:
> > +(define_memory_constraint "em"
> > +  "A memory operand that does not contain a prefixed address."
> > +  (and (match_code "mem")
> > +   (match_operand 0 "non_prefixed_memory")))
> > +
> > +(define_memory_constraint "ep"
> > +  "A memory operand that does contains a prefixed address."
> > +  (and (match_code "mem")
> > +   (match_operand 0 "prefixed_memory")))
> 
> "does contain".  Or maybe just say "with a non-prefixed address" and
> "with a prefixed address"?

Ok.

> > +;; Return true if the operand is a valid memory address that does not use a
> > +;; prefixed address.
> > +(define_predicate "non_prefixed_memory"
> > +  (match_code "mem")
> > +{
> > +  enum insn_form iform
> > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > +
> > +  return (iform != INSN_FORM_BAD
> > +  && iform != INSN_FORM_PREFIXED_NUMERIC
> > + && iform != INSN_FORM_PCREL_LOCAL
> > + && iform != INSN_FORM_PCREL_EXTERNAL);
> > +})
> 
> Why can this not use just !address_is_prefixed?  Why is an
> INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed?  What
> does "BAD" mean, really?  Should that ever happen, should that not ICE?

You can't just invert !address_is_prefixed, because it would all things that
may not be valid memory addresses.

So we could just do:

{
  /* If the operand is not a valid memory operand even if it is not prefixed,
 do not return true.  */
  if (!memory_operand (op, mode))
return false;

  return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
}

It is important that the predicate not return true if the operand is NOT a
valid memory address.  If you allow non-valid memory addresses, the register
allocator will create things like:

(mem:MODE (plus:DI (reg:DI x)
   (plus:DI (reg:DI y)
(const_int z

Or some such -- I forget the exact sequence it created.  A later pass would
then choke with bad insn.

INSN_FORM_BAD just means that the operand is not valid as a memory address.

> It is very confusing if any valid memory is neither "prefixed_memory" nor
> "non_prefixed_memory"!

The point was to make sure the memory is valid.  Once it is a valid memory
address, then just a simple !address_is_prefixed will work.

> > --- gcc/doc/md.texi (revision 279182)
> > +++ gcc/doc/md.texi (working copy)
> > @@ -3373,6 +3373,12 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va
> >  
> >  is not.
> >  
> > +@item em
> > +A memory operand that does not contain a prefixed address.
> > +
> > +@item ep
> > +A memory operand that does contains a prefixed address.
> 
> Same comments as above.

Ok.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints

2019-12-17 Thread Michael Meissner

On Tue, Dec 17, 2019 at 05:35:24PM -0600, Segher Boessenkool wrote:
> On Tue, Dec 17, 2019 at 05:29:44PM -0500, Michael Meissner wrote:
> > On Tue, Dec 17, 2019 at 11:15:29AM -0600, Segher Boessenkool wrote:
> > > > +;; Return true if the operand is a valid memory address that does not 
> > > > use a
> > > > +;; prefixed address.
> > > > +(define_predicate "non_prefixed_memory"
> > > > +  (match_code "mem")
> > > > +{
> > > > +  enum insn_form iform
> > > > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > > > +
> > > > +  return (iform != INSN_FORM_BAD
> > > > +  && iform != INSN_FORM_PREFIXED_NUMERIC
> > > > + && iform != INSN_FORM_PCREL_LOCAL
> > > > + && iform != INSN_FORM_PCREL_EXTERNAL);
> > > > +})
> > > 
> > > Why can this not use just !address_is_prefixed?  Why is an
> > > INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed?  What
> > > does "BAD" mean, really?  Should that ever happen, should that not ICE?
> > 
> > You can't just invert !address_is_prefixed, because it would all things that
> > may not be valid memory addresses.
> 
> Yes, so test that *explicitly*, in the "prefixed_memory" predicate as
> well please.  Make the two predicates as much the same as possible.
> 
> And what is with the INSN_FORM_PCREL_EXTERNAL?

INSN_FORM_PCREL_EXTERNAL says that the operand is a reference to an external
symbol.  It cannot appear in an actual memory insns in normal usage, but it
needs to be handled several places:

1) pcrel_extern_addr needs to be able to load an external address into a GPR
register.

2) The prefixed insn attribute (and prefixed_paddi_p which it calls) needs to
recognize pcrel_extern_addr and note that it is prefixed.

3) The PCREL_OPT support will need to support it.  If you do the PCREL_OPT
support via combine and flow control passes, you will need to be able to handle
external references as addresses.

The function address_is_prefixed, specifically does not return true for
external symbols, because you can't use them in a normal context.

In the context of the patch (vector extract), it needs to decide whether the
address is prefixed or not, in order to decide whether it needs a second base
register temporary.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V10 patch #5, Fix codegen bug with vector extracts using a variable offset & PC-relative address

2019-12-18 Thread Michael Meissner

On Tue, Dec 17, 2019 at 12:02:46PM -0600, Segher Boessenkool wrote:
> >  ;; Variable V2DI/V2DF extract
> >  (define_insn_and_split "vsx_extract__var"
> > -  [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r")
> > -   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m")
> > -(match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> > -   UNSPEC_VSX_EXTRACT))
> > -   (clobber (match_scratch:DI 3 "=r,&b,&b"))
> > -   (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
> > +  [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r,wa,r")
> > +   (unspec:
> > +[(match_operand:VSX_D 1 "input_operand" "v,em,em,ep,ep")
> > + (match_operand:DI 2 "gpc_reg_operand" "r,r,r,r,r")]
> > +UNSPEC_VSX_EXTRACT))
> > +   (clobber (match_scratch:DI 3 "=r,&b,&b,&b,&b"))
> > +   (clobber (match_scratch:V2DI 4 "=&v,X,X,X,X"))
> > +   (clobber (match_scratch:DI 5 "=X,X,X,&b,&b"))]
> >"VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
> >"#"
> >"&& reload_completed"
> >[(const_int 0)]
> >  {
> >rs6000_split_vec_extract_var (operands[0], operands[1], operands[2],
> > -   operands[3], operands[4]);
> > +   operands[3], operands[4], operands[5]);
> 
> This writes to operands[2], which does not match its constraint.
> 
> Same in the other splitters.

Right.  Good catch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] PowerPC, Rename SIGNED_BIT_OFFSET_P to SIGNED_INTEGER_BIT_P

2019-12-18 Thread Michael Meissner

In the patch:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01201.html

Segher Boessenkool asked me to submit a patch to rename the macros used to see
if a number is a valid signed 16 or 34-bit value:

> Please follow up with a patch to not call random numbers "OFFSET".

This patch does this, renaming:

SIGNED_34BIT_OFFSET_P   -> SIGNED_INTEGER_34BIT_P
SIGNED_16BIT_OFFSET_P   -> SIGNED_INTEGER_16BIT_P

I did not change the secondary macros (SIGNED_34BIT_OFFSET_EXTRA_P and
SIGNED_16BIT_OFFSET_P), since those are exclusively used for offset
calculations.  But I can if you prefer it that way.

I also converted one a use in num_insns_constant_gpr to use the macro (it had
been in previous patches, but I dropped in the last patch just to get the
minimal change in).

I've bootstrapped compilers with these patches and there was no regression in
the test suite.  Can I check this into the trunk?

Some of the remaining patches in the V10 series will need to be modified as
well.  I will submit those patches (after I rework the vector extract stuff) in
a new series.

2019-12-17   Michael Meissner  

* config/rs6000/predicates.md (cint34_operand): Use
SIGNED_INTEGER_34BIT_P macro.
* config/rs6000/rs6000.c (num_insns_constant_gpr): Use the
SIGNED_INTEGER_16BIT_P and SIGNED_INTEGER_34BIT_P macros.
(address_to_insn_form): Use the SIGNED_INTEGER_16BIT_P and
SIGNED_INTEGER_34BIT_P macros.
* config/rs6000/rs6000.h (SIGNED_INTEGER_NBIT_P): New macro.
(SIGNED_INTEGER_16BIT_P): Rename SIGNED_16BIT_OFFSET_P to be
SIGNED_INTEGER_34BIT_P.
(SIGNED_INTEGER_34BIT_P): Rename SIGNED_34BIT_OFFSET_P to be
SIGNED_INTEGER_34BIT_P.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 279478)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -309,7 +309,7 @@ (define_predicate "cint34_operand"
   if (!TARGET_PREFIXED_ADDR)
 return 0;
 
-  return SIGNED_34BIT_OFFSET_P (INTVAL (op));
+  return SIGNED_INTEGER_34BIT_P (INTVAL (op));
 })
 
 ;; Return 1 if op is a register that is not special.
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279478)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5557,7 +5557,7 @@ static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
 {
   /* signed constant loadable with addi */
-  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x1)
+  if (SIGNED_INTEGER_16BIT_P (value))
 return 1;
 
   /* constant loadable with addis */
@@ -5566,7 +5566,7 @@ num_insns_constant_gpr (HOST_WIDE_INT va
 return 1;
 
   /* PADDI can support up to 34 bit signed integers.  */
-  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
+  else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (value))
 return 1;
 
   else if (TARGET_POWERPC64)
@@ -24770,7 +24770,7 @@ address_to_insn_form (rtx addr,
 return INSN_FORM_BAD;
 
   HOST_WIDE_INT offset = INTVAL (op1);
-  if (!SIGNED_34BIT_OFFSET_P (offset))
+  if (!SIGNED_INTEGER_34BIT_P (offset))
 return INSN_FORM_BAD;
 
   /* Check for local and external PC-relative addresses.  Labels are always
@@ -24789,7 +24789,7 @@ address_to_insn_form (rtx addr,
 return INSN_FORM_BAD;
 
   /* Large offsets must be prefixed.  */
-  if (!SIGNED_16BIT_OFFSET_P (offset))
+  if (!SIGNED_INTEGER_16BIT_P (offset))
 {
   if (TARGET_PREFIXED_ADDR)
return INSN_FORM_PREFIXED_NUMERIC;
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 279478)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -2529,18 +2529,16 @@ typedef struct GTY(()) machine_function
 #pragma GCC poison TARGET_FLOAT128 OPTION_MASK_FLOAT128 MASK_FLOAT128
 #endif
 
-/* Whether a given VALUE is a valid 16 or 34-bit signed offset.  */
-#define SIGNED_16BIT_OFFSET_P(VALUE)   \
+/* Whether a given VALUE is a valid 16 or 34-bit signed integer.  */
+#define SIGNED_INTEGER_NBIT_P(VALUE, N)
\
   IN_RANGE ((VALUE),   \
-   -(HOST_WIDE_INT_1 << 15),   \
-   (HOST_WIDE_INT_1 << 15) - 1)
+   -(HOST_WIDE_INT_1 << ((N)-1)),  \
+   (HOST_WIDE_INT_1 << ((N)-1)) - 1)
 
-#define SIGNED_34BIT_OFFSET_P(VALUE)   \
-  IN_RANGE ((VALUE),   \
-   -(HOST_WIDE_INT_1 << 33),   \
-   (HOST_WIDE_INT_1 << 33) - 1)
+#define SIGNED_INTEGER_16BIT_P(VALUE)  SIGNED_INTEGER_NBIT_P (VALUE,

PowerPC -mcpu=future patches, V11

2019-12-20 Thread Michael Meissner

This set of patches reworks the vector extract issues in the V10 patches.

If you recall, in V10, you pointed out that for vector extract, the existing
code overwrote an input argument, and that is fixed in these patches.

In V10, I added two new constraints (ep and em) to categorize whether a memory
is prefixed or not prefixed, and we had some discussion about how to write the
predicates.

However, yesterday I realized that for the case adding new constraints (vector
extract with a variable element number, where the vector is in memory, and we
are optimizing the load to just load up the element being extract), what we
want is just the address of the vector in a base register.

This is because in order access the element where the element number is
variable, we eventually will need to do an X-FORM load, with the vector address
in one register, and the byte offset in another.

Instead of adding new alternatives and new scratch registers, I could just
simplify the code and use the 'Q' constraint that says use a single register as
the address.  The register allocator will do the necessary work to load up the
address during register allocation.

I did notice that the documentation for 'Q' was wrong, so one of the patches
updates the documentation.

In addition, after committing the first 3 patches from V10 that added PADDI and
PLI support for -mcpu=future, Segher asked me to do a patch to rename two of
the macros.  That patch is now checked in, and some of these patches include
changes due to the macro renaming.

After the vector extract patch rework, I included the remaining patch to the
compiler (make -mpcrel default on Linux 64-bit for -mcpu=future).  I included
the tests after doing the -mpcrel default changes.  In addition to the tests in
V10, I added some new tests for the vector extract code.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #1 of 15, Fix bug in vec_extract

2019-12-20 Thread Michael Meissner

This patch fixes the bug pointed out in the V10 patch review that the code
modified an input argument to vector extract with a variable element number.

I also added two gcc_asserts to the vector extract address code to signal an
internal error if the temporary base register was used for two different
purposes.  This shows up if you have a vector whose address is a PC-relative
address and the element number was variable.

Later patches will fix the case that I know of that generates the bad code, but
it is still important to make sure the same case doesn't happen in the future.

With this patch applied, the compiler will signal an error.  FWIW, I did build
all of Spec 2017 and Spec 2006 with this patch applied, but not the others, and
we did not get an assertion failure.

I have bootstrapped the compiler and there were no regression test failures on
a little endian Power8 system.

2019-12-20  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add
assertion to make sure that we don't load an address into a
temporary that is already used.
(rs6000_split_vec_extract_var): Do not overwrite the element when
masking it.  Use the base register temporary instead.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279549)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6757,6 +6757,8 @@ rs6000_adjust_vec_address (rtx scalar_re
 
   else
{
+ /* If we are called from rs6000_split_vec_extract_var, base_tmp may
+be the same as element.  */
  if (TARGET_POWERPC64)
emit_insn (gen_ashldi3 (base_tmp, element, GEN_INT (byte_shift)));
  else
@@ -6825,6 +6827,11 @@ rs6000_adjust_vec_address (rtx scalar_re
 
  else
{
+ /* Make sure base_tmp is not the same as element_offset.  This
+can happen if the element number is variable and the address
+is not a simple address.  Otherwise we lose the offset, and
+double the address.  */
+ gcc_assert (!reg_mentioned_p (base_tmp, element_offset));
  emit_move_insn (base_tmp, op1);
  emit_insn (gen_add2_insn (base_tmp, element_offset));
}
@@ -6835,6 +6842,10 @@ rs6000_adjust_vec_address (rtx scalar_re
 
   else
 {
+  /* Make sure base_tmp is not the same as element_offset.  This can happen
+if the element number is variable and the address is not a simple
+address.  Otherwise we lose the offset, and double the address.  */
+  gcc_assert (!reg_mentioned_p (base_tmp, element_offset));
   emit_move_insn (base_tmp, addr);
   new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
 }
@@ -6902,9 +6913,10 @@ rs6000_split_vec_extract_var (rtx dest,
   int num_elements = GET_MODE_NUNITS (mode);
   rtx num_ele_m1 = GEN_INT (num_elements - 1);
 
-  emit_insn (gen_anddi3 (element, element, num_ele_m1));
+  /* Make sure the element number is in bounds.  */
   gcc_assert (REG_P (tmp_gpr));
-  emit_move_insn (dest, rs6000_adjust_vec_address (dest, src, element,
+  emit_insn (gen_anddi3 (tmp_gpr, element, num_ele_m1));
+  emit_move_insn (dest, rs6000_adjust_vec_address (dest, src, tmp_gpr,
   tmp_gpr, scalar_mode));
   return;
     }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #2 of 15, Use prefixed load for vector extract with large offset

2019-12-20 Thread Michael Meissner

This patch incorporates large offsets for -mcpu=future when we optimization a
vector extract from memory and the memory address previously had been a
prefixed address with a large offset.

The current code would generate loading up the constant into a temporary and
then doing an indexed load.  Successive passes would eventually optimize that
back into the form we want (having the base register plus a large offset), but
it is better to generate the optimial code sooner.

I have bootstrapped this change on a little endian power8 system and there were
no regressions.  Can I check this into the trunk?

2019-12-20  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
for the offset being 34-bits when -mcpu=future is used.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279553)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6792,9 +6792,17 @@ rs6000_adjust_vec_address (rtx scalar_re
  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
  rtx offset_rtx = GEN_INT (offset);
 
- if (IN_RANGE (offset, -32768, 32767)
+ /* 16-bit offset.  */
+ if (SIGNED_INTEGER_16BIT_P (offset)
  && (scalar_size < 8 || (offset & 0x3) == 0))
new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+ /* 34-bit offset if we have prefixed addresses.  */
+ else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (offset))
+   new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+ /* Offset overflowed, move offset to the temporary (which will likely
+be split), and do X-FORM addressing.  */
  else
{
  emit_move_insn (base_tmp, offset_rtx);
@@ -6825,6 +6833,12 @@ rs6000_adjust_vec_address (rtx scalar_re
  emit_insn (insn);
}
 
+ /* Make sure we don't overwrite the temporary if the element being
+extracted is variable, and we've put the offset into base_tmp
+previously.  */
+ else if (rtx_equal_p (base_tmp, element_offset))
+   emit_insn (gen_add2_insn (base_tmp, op1));
+
  else
{
  /* Make sure base_tmp is not the same as element_offset.  This

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #3 of 15, Use 'Q' constraint for variable vector extract from memory

2019-12-20 Thread Michael Meissner

As I mentioned in the intro, for the case where we are optimizing the extract
of a variable element from a vector in memory, the current code takes a regular
address, and the temporary that holds the byte offset, and tries to generate a
new address.  In particular, it failed when the vector was a PC-relative
address, because it didn't have enough temporary registers, and it used the
temporary to hold the byte offset to hold the address.

Initially in doing these patches, I reworked the constraints for prefixed and
non-prefixed memory so we could identify when we needed a second temporary.
Then I realized that eventaully we will want to generate an X-FORM (register +
register) address, and it was just simpler to use the 'Q' constraint, and have
the register allocator put the address into a register.

I have verified that the bug is indeed fixed (patch #15 will include the new
tests for this).  I have also bootstrapped the compiler on a little endian
power8 machine and there were no regressions in the test suite.  Can I check
this patch into the trunk?

2019-12-20  Michael Meissner  

* config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator):
Use 'Q' for memory constraints because we need to do an X-FORM
load with the variable index.
(vsx_extract_v4sf_var): Use 'Q' for memory constraints because we
need to do an X-FORM load with the variable index.
(vsx_extract__var, VSX_EXTRACT_I iterator):Use 'Q' for
memory constraints because we need to do an X-FORM load with the
variable index.
(vsx_extract__mode_var): Use 'Q' for memory
constraints because we need to do an X-FORM load with the variable
index.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 279597)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -3245,10 +3245,11 @@ (define_insn "vsx_vslo_"
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-;; Variable V2DI/V2DF extract
+;; Variable V2DI/V2DF extract.  Use 'Q' for the memory because we will
+;; ultimately have to convert the address into base + index.
 (define_insn_and_split "vsx_extract__var"
   [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r")
-   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m")
+   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
 (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
UNSPEC_VSX_EXTRACT))
(clobber (match_scratch:DI 3 "=r,&b,&b"))
@@ -3318,7 +3319,7 @@ (define_insn_and_split "*vsx_extract_v4s
 ;; Variable V4SF extract
 (define_insn_and_split "vsx_extract_v4sf_var"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r")
-   (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m")
+   (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q")
(match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
   UNSPEC_VSX_EXTRACT))
(clobber (match_scratch:DI 3 "=r,&b,&b"))
@@ -3681,7 +3682,7 @@ (define_insn_and_split "*vsx_extract__var"
   [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r")
(unspec:
-[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 UNSPEC_VSX_EXTRACT))
(clobber (match_scratch:DI 3 "=r,r,&b"))
@@ -3701,7 +3702,7 @@ (define_insn_and_split "*vsx_extract_ 0 "gpc_reg_operand" "=r,r,r")
(zero_extend:
 (unspec:
- [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+ [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
  UNSPEC_VSX_EXTRACT)))
(clobber (match_scratch:DI 3 "=r,r,&b"))

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #4 of 15, Update 'Q' constraint documentation.

2019-12-20 Thread Michael Meissner

In doing V11 patch #3, I noticed that the documentation for the 'Q' was
misleading.  This patch updates the documentation.  Can I check this patch into
the trunk?

2019-12-20  Michael Meissner  

* config/rs6000/constraints.md (Q constraint): Update
documentation.
* doc/md.tet (PowerPC constraints): Update 'Q' constraint
documentation.

Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 279547)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -211,8 +211,7 @@ several times, or that might not access
(match_test "GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC")))
 
 (define_memory_constraint "Q"
-  "Memory operand that is an offset from a register (it is usually better
-to use @samp{m} or @samp{es} in @code{asm} statements)"
+  "A memory operand whose address which uses a single register with no offset."
   (and (match_code "mem")
(match_test "REG_P (XEXP (op, 0))")))
 
Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi (revision 279547)
+++ gcc/doc/md.texi (working copy)
@@ -3381,8 +3381,7 @@ allowed when @samp{<} or @samp{>} is use
 as @samp{m} without @samp{<} and @samp{>}.
 
 @item Q
-Memory operand that is an offset from a register (it is usually better
-to use @samp{m} or @samp{es} in @code{asm} statements)
+A memory operand whose address which uses a single register with no offset.
 
 @item Z
 Memory operand that is an indexed or indirect from a register (it is

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address

2019-12-20 Thread Michael Meissner

This patch recognizes when we are doing the optimization of vector extract with
a constant element number when the vector is in memory and the vector's address
is PC-relative, to directly re-form the address using a PC-relative load,
instead of loading the address into a temporary register, and then doing an
indirect load.

I have bootstrapped a compiler on a little endian power8 machine and ran the
testsuite with no regressions.  Can I check this into the trunk?

2019-12-20  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper
function to identify the address mask of a hard register.
(rs6000_adjust_vec_address): If we have a PC-relative address and
a constant vector element number, fold the element number into the
PC-relative address.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279597)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6722,6 +6722,30 @@ rs6000_expand_vector_extract (rtx target
 }
 }
 
+/* Helper function to return an address mask based on a physical register.  */
+
+static addr_mask_type
+rs6000_reg_to_addr_mask (rtx reg, machine_mode mode)
+{
+  unsigned int r = reg_or_subregno (reg);
+  addr_mask_type addr_mask;
+
+  gcc_assert (HARD_REGISTER_NUM_P (r));
+  if (INT_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR];
+
+  else if (FP_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR];
+
+  else if (ALTIVEC_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX];
+
+  else
+gcc_unreachable ();
+
+  return addr_mask;
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -6854,6 +6878,51 @@ rs6000_adjust_vec_address (rtx scalar_re
}
 }
 
+  /* For references to local static variables, try to fold a constant offset
+ into the address.  */
+  else if (pcrel_local_address (addr, Pmode) && CONST_INT_P (element_offset))
+{
+  if (GET_CODE (addr) == CONST)
+   addr = XEXP (addr, 0);
+
+  if (GET_CODE (addr) == PLUS)
+   {
+ rtx op0 = XEXP (addr, 0);
+ rtx op1 = XEXP (addr, 1);
+ if (CONST_INT_P (op1))
+   {
+ HOST_WIDE_INT offset
+   = INTVAL (XEXP (addr, 1)) + INTVAL (element_offset);
+
+ if (offset == 0)
+   new_addr = op0;
+
+ else if (SIGNED_INTEGER_34BIT_P (offset))
+   {
+ rtx plus = gen_rtx_PLUS (Pmode, op0, GEN_INT (offset));
+ new_addr = gen_rtx_CONST (Pmode, plus);
+   }
+
+ else
+   {
+ emit_move_insn (base_tmp, addr);
+ new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
+   }
+   }
+ else
+   {
+ emit_move_insn (base_tmp, addr);
+ new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
+   }
+   }
+
+  else
+   {
+ rtx plus = gen_rtx_PLUS (Pmode, addr, element_offset);
+ new_addr = gen_rtx_CONST (Pmode, plus);
+   }
+}
+
   else
 {
   /* Make sure base_tmp is not the same as element_offset.  This can happen
@@ -6869,21 +6938,8 @@ rs6000_adjust_vec_address (rtx scalar_re
   if (GET_CODE (new_addr) == PLUS)
 {
   rtx op1 = XEXP (new_addr, 1);
-  addr_mask_type addr_mask;
-  unsigned int scalar_regno = reg_or_subregno (scalar_reg);
-
-  gcc_assert (HARD_REGISTER_NUM_P (scalar_regno));
-  if (INT_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_GPR];
-
-  else if (FP_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_FPR];
-
-  else if (ALTIVEC_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_VMX];
-
-  else
-   gcc_unreachable ();
+  addr_mask_type addr_mask
+   = rs6000_reg_to_addr_mask (scalar_reg, scalar_mode);
 
   if (REG_P (op1) || SUBREG_P (op1))
valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
@@ -6891,9 +6947,21 @@ rs6000_adjust_vec_address (rtx scalar_re
valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
 }
 
+  /* An address that is a single register is always valid for either indexed or
+ offsettable loads.  */
   else if (REG_P (new_addr) || SUBREG_P (new_addr))
 valid_addr_p = true;
 
+  /* If we have a PC-relative address, check if offsetable loads are
+ allowed.  */
+  else if (pcrel_local_address (new_addr, Pmode))
+{
+  addr_mask_type addr_mask
+   = rs6000_reg_to_addr_mask (scalar_reg, scalar_mode);
+
+  valid_addr_p = (addr_mask & RELOAD_REG_OFFSE

[PATCH] V11 patch #6 of 15, Make -mpcrel the default for -mcpu=future on Linux 64-bit

2019-12-20 Thread Michael Meissner

This is the same as V10 patch #8.  Once the vector extract patches are
committed, this patch flips the default to use PC-relative addressing on 64-bit
Linux systems when the uses -mcpu=future.
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00841.html

I have bootstrapped the compiler on a little endian power8 system and ran the
testsuite with no regressions.  Once the preceeding V11 patches have been
checked in, can I check these patches into the trunk?

2019-12-20  Michael Meissner  

* config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to
1 to enable prefixed addressing if -mcpu=future.
(PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing
if -mcpu=future.
* config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not
enable -mprefixed-addr or -mpcrel by default.
(ADDRESSING_FUTURE_MASKS): New macro.
(OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS.
* config/rs6000/rs6000.c (PREFIXED_ADDR_SUPPORTED_BY_OS): Disable
prefixed addressing unless the target OS tm.h says we should
enable it.
(PCREL_SUPPORTED_BY_OS): Disable PC-relative addressing unless the
target OS tm.h says we should enable it.
(rs6000_debug_reg_global): Print whether prefixed addressing and
PC-relative addressing is enabled by default if -mcpu=future.
(rs6000_option_override_internal): Move setting prefixed
addressing and PC-relative addressing after the sub-target option
handling is done.  Only enable prefixed addressing or PC-relative
address on -mcpu=future system if the target OS says to enable
it.  Disallow prefixed addressing on 32-bit systems or if the
target object file is not ELF v2.

Index: gcc/config/rs6000/linux64.h
===
--- gcc/config/rs6000/linux64.h (revision 279141)
+++ gcc/config/rs6000/linux64.h (working copy)
@@ -640,3 +640,11 @@ extern int dot_symbols;
enabling the __float128 keyword.  */
 #undef TARGET_FLOAT128_ENABLE_TYPE
 #define TARGET_FLOAT128_ENABLE_TYPE 1
+
+/* Enable support for pc-relative and numeric prefixed addressing on the
+   'future' system.  */
+#undef  PREFIXED_ADDR_SUPPORTED_BY_OS
+#define PREFIXED_ADDR_SUPPORTED_BY_OS  1
+
+#undef  PCREL_SUPPORTED_BY_OS
+#define PCREL_SUPPORTED_BY_OS  1
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 279141)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -75,15 +75,22 @@
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_P9_VECTOR)
 
-/* Support for a future processor's features.  Do not enable -mpcrel until it
-   is fully functional.  */
+/* Support for a future processor's features.  The prefixed and pc-relative
+   addressing bits are not added here.  Instead, they are added if the target
+   OS tm.h says that it supports the addressing modes by default when
+   -mcpu=future is used.  */
 #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER   
\
-| OPTION_MASK_FUTURE   \
+| OPTION_MASK_FUTURE)
+
+/* Addressing related flags on a future processor.  These are options that need
+   to be cleared if the target OS is not capable of supporting prefixed
+   addressing at all (such as 32-bit mode or if the object file format is not
+   ELF v2).  */
+#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL  
\
 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL  \
-| OPTION_MASK_PREFIXED_ADDR)
+#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279202)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -98,6 +98,16 @@
 #endif
 #endif
 
+/* Set up the defaults for whether prefixed addressing is used, and if it is
+   used, whether we want to turn on pc-relative support by default.  */
+#ifndef PREFIXED_ADDR_SUPPORTED_BY_OS
+#define PREFIXED_ADDR_SUPPORTED_BY_OS  0
+#endif
+
+#ifndef PCREL_SUPPORTED_BY_OS
+#define PCREL_SUPPORTED_BY_OS  0
+#endif
+
 /* Support targetm.vectorize.builtin_mask_for_load.  */
 GTY(()) tree altivec_builtin_mask_for_load;
 
@@ -2535,6 +2545,14 @@ rs6000_debug_reg_global (void)
   if (TARGET_DIRECT_MOVE_128)
 fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld el

[PATCH] V11 patch #7 of 15, Add new target_supports cases for -mcpu=future tests.

2019-12-20 Thread Michael Meissner

This is V10 patch #9.  It adds new target_supports tests for the new patches:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00842.html

All of the new tests work with these target supports.  Can I check it into the
trunk?

2019-12-20  Michael Meissner  

* lib/target-supports.exp (check_effective_target_powerpc_pcrel):
New target for PowerPC -mcpu=future support.
(check_effective_target_powerpc_prefixed_addr): New target for
PowerPC -mcpu=future support.

Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 279547)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -2161,6 +2161,23 @@ proc check_p9modulo_hw_available { } {
 }]
 }
 
+# Return 1 if the target generates PC-relative instructions automatically
+proc check_effective_target_powerpc_pcrel { } {
+return [check_no_messages_and_pattern powerpc_pcrel \
+   {\mpld\M.*[@]pcrel} assembly {
+   static long s;
+   long *p = &s;
+   long foo (void) { return s; }
+   } {-O2 -mcpu=future}]
+}
+
+# Return 1 if the target generates prefixed instructions automatically
+proc check_effective_target_powerpc_prefixed_addr { } {
+return [check_no_messages_and_pattern powerpc_prefixed_addr \
+   {\mpld\M} assembly {
+   long foo (long *p) { return p[0x12345]; }
+   } {-O2 -mcpu=future}]
+}
 
 # Return 1 if the target supports executing FUTURE instructions, 0 otherwise.
 # Cache the result.  It is assumed that if a simulator does not support the

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #8 of 15, Add new tests for using PADDI and PLI with -mcpu=future

2019-12-20 Thread Michael Meissner

This is V10 patch #10. It adds 3 new tests to verify that we generate PADDI/PLI
for large constants when -mcpu=future is used.
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00843.html

This test passes when the preceeding patches are applied.  Can I check this in?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/prefix-add.c: New test for -mcpu=future
generating PADDI for large constant adds.
* gcc.target/powerpc/prefix-di-constant.c: New test for
-mcpu=future generating PLI to load up large DImode constants.
* gcc.target/powerpc/prefix-si-constant.c: New test for
-mcpu=future generating PLI to load up large SImode constants.

Index: gcc/testsuite/gcc.target/powerpc/prefix-add.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-add.c   (revision 279252)
+++ gcc/testsuite/gcc.target/powerpc/prefix-add.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PADDI is generated to add a large constant.  */
+unsigned long
+add (unsigned long a)
+{
+  return a + 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpaddi\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c   (revision 
279252)
+++ gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant.  */
+unsigned long
+large (void)
+{
+  return 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c   (revision 
279252)
+++ gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c   (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant for SImode.  */
+void
+large_si (unsigned int *p)
+{
+  *p = 0x12345U;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #9 of 15, Add test to validate generating prefixed memory when the offset is invalid for DS/DQ insns

2019-12-20 Thread Michael Meissner

This is V10 patch #11.  This adds a new test to validate that for -mcpu=future,
we generate a prefixed load/store if the offset would have been illegal for a
non-prefixed DS or DQ instruction.
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00845.html

This test passes when I run the testsuite.  Can I check it in?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/prefix-ds-dq.c: New test to verify that we
generate the prefix load/store instructions for traditional
instructions with an offset that doesn't match DS/DQ
requirements.

Index: gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (revision 279256)
+++ gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c (working copy)
@@ -0,0 +1,156 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests whether we generate a prefixed load/store operation for addresses that
+   don't meet DS/DQ offset constraints.  */
+
+unsigned long
+load_uc_offset1 (unsigned char *p)
+{
+  return p[1]; /* should generate LBZ.  */
+}
+
+long
+load_sc_offset1 (signed char *p)
+{
+  return p[1]; /* should generate LBZ + EXTSB.  */
+}
+
+unsigned long
+load_us_offset1 (unsigned char *p)
+{
+  return *(unsigned short *)(p + 1);   /* should generate LHZ.  */
+}
+
+long
+load_ss_offset1 (unsigned char *p)
+{
+  return *(short *)(p + 1);/* should generate LHA.  */
+}
+
+unsigned long
+load_ui_offset1 (unsigned char *p)
+{
+  return *(unsigned int *)(p + 1); /* should generate LWZ.  */
+}
+
+long
+load_si_offset1 (unsigned char *p)
+{
+  return *(int *)(p + 1);  /* should generate PLWA.  */
+}
+
+unsigned long
+load_ul_offset1 (unsigned char *p)
+{
+  return *(unsigned long *)(p + 1);/* should generate PLD.  */
+}
+
+long
+load_sl_offset1 (unsigned char *p)
+{
+  return *(long *)(p + 1); /* should generate PLD.  */
+}
+
+float
+load_float_offset1 (unsigned char *p)
+{
+  return *(float *)(p + 1);/* should generate LFS.  */
+}
+
+double
+load_double_offset1 (unsigned char *p)
+{
+  return *(double *)(p + 1);   /* should generate LFD.  */
+}
+
+__float128
+load_float128_offset1 (unsigned char *p)
+{
+  return *(__float128 *)(p + 1);   /* should generate PLXV.  */
+}
+
+void
+store_uc_offset1 (unsigned char uc, unsigned char *p)
+{
+  p[1] = uc;   /* should generate STB.  */
+}
+
+void
+store_sc_offset1 (signed char sc, signed char *p)
+{
+  p[1] = sc;   /* should generate STB.  */
+}
+
+void
+store_us_offset1 (unsigned short us, unsigned char *p)
+{
+  *(unsigned short *)(p + 1) = us; /* should generate STH.  */
+}
+
+void
+store_ss_offset1 (signed short ss, unsigned char *p)
+{
+  *(signed short *)(p + 1) = ss;   /* should generate STH.  */
+}
+
+void
+store_ui_offset1 (unsigned int ui, unsigned char *p)
+{
+  *(unsigned int *)(p + 1) = ui;   /* should generate STW.  */
+}
+
+void
+store_si_offset1 (signed int si, unsigned char *p)
+{
+  *(signed int *)(p + 1) = si; /* should generate STW.  */
+}
+
+void
+store_ul_offset1 (unsigned long ul, unsigned char *p)
+{
+  *(unsigned long *)(p + 1) = ul;  /* should generate PSTD.  */
+}
+
+void
+store_sl_offset1 (signed long sl, unsigned char *p)
+{
+  *(signed long *)(p + 1) = sl;/* should generate PSTD.  */
+}
+
+void
+store_float_offset1 (float f, unsigned char *p)
+{
+  *(float *)(p + 1) = f;   /* should generate STF.  */
+}
+
+void
+store_double_offset1 (double d, unsigned char *p)
+{
+  *(double *)(p + 1) = d;  /* should generate STD.  */
+}
+
+void
+store_float128_offset1 (__float128 f128, unsigned char *p)
+{
+  *(__float128 *)(p + 1) = f128;   /* should generate PSTXV.  */
+}
+
+/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlbz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mlfd\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlfs\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlha\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlhz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlwz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mplwa\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mplxv\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstb\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstfd\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mstfs\M}  1 } } */
+/* { dg-final { scan-assembler-times {\msth\M}   2 } } */
+/* { dg-final { scan-assembler

[PATCH] V11 patch #10 of 15, Make sure we don't generate pre-modify prefixed insns with -mcpu=future

2019-12-20 Thread Michael Meissner

This is V10 patch #12.  It adds a test to make sure we don't generate a
prefixed instruction with PRE_INC, PRE_DEC, or PRE_MODIFY.
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00846.html

This test passes when I run it.  Can I check this into the trunk?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/prefix-no-premodify.c: Make sure we do not
generate the non-existent PLWZU instruction if -mcpu=future.

Index: gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c  (revision 
279259)
+++ gcc/testsuite/gcc.target/powerpc/prefix-no-premodify.c  (working copy)
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Make sure that we don't generate a prefixed form of the load and store with
+   update instructions (i.e. instead of generating LWZU we have to generate
+   PLWZ plus a PADDI).  */
+
+#ifndef SIZE
+#define SIZE 5
+#endif
+
+struct foo {
+  unsigned int field;
+  char pad[SIZE];
+};
+
+struct foo *inc_load (struct foo *p, unsigned int *q)
+{
+  *q = (++p)->field;   /* PLWZ, PADDI, STW.  */
+  return p;
+}
+
+struct foo *dec_load (struct foo *p, unsigned int *q)
+{
+  *q = (--p)->field;   /* PLWZ, PADDI, STW.  */
+  return p;
+}
+
+struct foo *inc_store (struct foo *p, unsigned int *q)
+{
+  (++p)->field = *q;   /* LWZ, PADDI, PSTW.  */
+  return p;
+}
+
+struct foo *dec_store (struct foo *p, unsigned int *q)
+{
+  (--p)->field = *q;   /* LWZ, PADDI, PSTW.  */
+  return p;
+}
+
+/* { dg-final { scan-assembler-times {\mlwz\M}2 } } */
+/* { dg-final { scan-assembler-times {\mstw\M}2 } } */
+/* { dg-final { scan-assembler-times {\mpaddi\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mplwz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}   2 } } */
+/* { dg-final { scan-assembler-not   {\mplwzu\M}} } */
+/* { dg-final { scan-assembler-not   {\mpstwu\M}} } */
+/* { dg-final { scan-assembler-not   {\maddis\M}} } */
+/* { dg-final { scan-assembler-not   {\maddi\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #11 of 15, Add new tests for generating prefixed loads/stores on -mcpu=future with large offsets

2019-12-20 Thread Michael Meissner

This is a reworking of the tests I submitted previously in V8 #4.  It generates
a bunch of loads and stores for various types using large addresses, and
verifies that the number of prefixed loads and stores is correct.
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00084.html

This patch works when I run the testsuite.  Can I check it in?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/prefix-large.h: New set of tests to test
prefixed addressing on 'future' system with large numeric offsets
for various types.
* gcc.target/powerpc/prefix-large-dd.c: New test for prefixed
loads/stores with large offsets for the _Decimal64 type.
* gcc.target/powerpc/prefix-large-df.c: New test for prefixed
loads/stores with large offsets for the double type.
* gcc.target/powerpc/prefix-large-di.c: New test for prefixed
loads/stores with large offsets for the long type.
* gcc.target/powerpc/prefix-large-hi.c: New test for prefixed
loads/stores with large offsets for the short type.
* gcc.target/powerpc/prefix-large-kf.c: New test for prefixed
loads/stores with large offsets for the __float128 type.
* gcc.target/powerpc/prefix-large-qi.c: New test for prefixed
loads/stores with large offsets for the signed char type.
* gcc.target/powerpc/prefix-large-sd.c: New test for prefixed
loads/stores with large offsets for the _Decimal32 type.
* gcc.target/powerpc/prefix-large-sf.c: New test for prefixed
loads/stores with large offsets for the float type.
* gcc.target/powerpc/prefix-large-si.c: New test for prefixed
loads/stores with large offsets for the int type.
* gcc.target/powerpc/prefix-large-udi.c: New test for prefixed
loads/stores with large offsets for the unsigned long type.
* gcc.target/powerpc/prefix-large-uhi.c: New test for prefixed
loads/stores with large offsets for the unsigned short type.
* gcc.target/powerpc/prefix-large-uqi.c: New test for prefixed
loads/stores with large offsets for the unsigned char type.
* gcc.target/powerpc/prefix-large-usi.c: New test for prefixed
loads/stores with large offsets for the unsigned int type.
* gcc.target/powerpc/prefix-large-v2df.c: New test for prefixed
loads/stores with large offsets for the vector double type.

Index: gcc/testsuite/gcc.target/powerpc/prefix-large.h
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large.h (revision 279319)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large.h (working copy)
@@ -0,0 +1,59 @@
+/* Common tests for prefixed instructions testing whether we can generate a
+   34-bit offset using 1 instruction.  */
+
+typedef signed charschar;
+typedef unsigned char  uchar;
+typedef unsigned short ushort;
+typedef unsigned int   uint;
+typedef unsigned long  ulong;
+typedef long doubleldouble;
+typedef vector double  v2df;
+typedef vector longv2di;
+typedef vector float   v4sf;
+typedef vector int v4si;
+
+#ifndef TYPE
+#define TYPE ulong
+#endif
+
+#ifndef ITYPE
+#define ITYPE TYPE
+#endif
+
+#ifndef OTYPE
+#define OTYPE TYPE
+#endif
+
+#if !defined(DO_ADD) && !defined(DO_VALUE) && !defined(DO_SET)
+#define DO_ADD 1
+#define DO_VALUE   1
+#define DO_SET 1
+#endif
+
+#ifndef CONSTANT
+#define CONSTANT   0x123450UL
+#endif
+
+#if DO_ADD
+void
+add (TYPE *p, TYPE a)
+{
+  p[CONSTANT] += a;
+}
+#endif
+
+#if DO_VALUE
+OTYPE
+value (TYPE *p)
+{
+  return p[CONSTANT];
+}
+#endif
+
+#if DO_SET
+void
+set (TYPE *p, ITYPE a)
+{
+  p[CONSTANT] = a;
+}
+#endif
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c  (revision 279319)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset for _Decimal64 objects.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-df.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-df.c  (revision 279319)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed

[PATCH] V11 patch #12 of 15, Add new PC-relative tests for -mcpu=future

2019-12-20 Thread Michael Meissner

This is a reworking of patch V8 #5.  It adds a bunch of PC-relative tests for
the -mcpu=future target.
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00085.html

This test passes when I run it.  Can I check it in?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/prefix-pcrel.h: New set of tests to test
prefixed addressing on 'future' system with PC-relative addresses
for various types.
* gcc.target/powerpc/prefix-pcrel-dd.c: New test for prefixed
loads/stores with PC-relative addresses for the _Decimal64 type.
* gcc.target/powerpc/prefix-pcrel-df.c: New test for prefixed
loads/stores with PC-relative addresses for the double type.
* gcc.target/powerpc/prefix-pcrel-di.c: New test for prefixed
loads/stores with PC-relative addresses for the long type.
* gcc.target/powerpc/prefix-pcrel-hi.c: New test for prefixed
loads/stores with PC-relative addresses for the short type.
* gcc.target/powerpc/prefix-pcrel-kf.c: New test for prefixed
loads/stores with PC-relative addresses for the __float128 type.
* gcc.target/powerpc/prefix-pcrel-qi.c: New test for prefixed
loads/stores with PC-relative addresses for the signed char type.
* gcc.target/powerpc/prefix-pcrel-sd.c: New test for prefixed
loads/stores with PC-relative addresses for the _Decimal32 type.
* gcc.target/powerpc/prefix-pcrel-sf.c: New test for prefixed
loads/stores with PC-relative addresses for the float type.
* gcc.target/powerpc/prefix-pcrel-si.c: New test for prefixed
loads/stores with PC-relative addresses for the int type.
* gcc.target/powerpc/prefix-pcrel-udi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned long
type.
* gcc.target/powerpc/prefix-pcrel-uhi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned short
type.
* gcc.target/powerpc/prefix-pcrel-uqi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned char
type.
* gcc.target/powerpc/prefix-pcrel-usi.c: New test for prefixed
loads/stores with PC-relative addresses for the unsigned int
type.
* gcc.target/powerpc/prefix-pcrel-v2df.c: New test for prefixed
loads/stores with PC-relative addresses for the vector double
type.

Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h (revision 279322)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h (working copy)
@@ -0,0 +1,58 @@
+/* Common tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for each type.  */
+
+typedef signed charschar;
+typedef unsigned char  uchar;
+typedef unsigned short ushort;
+typedef unsigned int   uint;
+typedef unsigned long  ulong;
+typedef long doubleldouble;
+typedef vector double  v2df;
+typedef vector longv2di;
+typedef vector float   v4sf;
+typedef vector int v4si;
+
+#ifndef TYPE
+#define TYPE ulong
+#endif
+
+#ifndef ITYPE
+#define ITYPE TYPE
+#endif
+
+#ifndef OTYPE
+#define OTYPE TYPE
+#endif
+
+static TYPE a;
+TYPE *p = &a;
+
+#if !defined(DO_ADD) && !defined(DO_VALUE) && !defined(DO_SET)
+#define DO_ADD 1
+#define DO_VALUE   1
+#define DO_SET 1
+#endif
+
+#if DO_ADD
+void
+add (TYPE b)
+{
+  a += b;
+}
+#endif
+
+#if DO_VALUE
+OTYPE
+value (void)
+{
+  return (OTYPE)a;
+}
+#endif
+
+#if DO_SET
+void
+set (ITYPE b)
+{
+  a = (TYPE)b;
+}
+#endif
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c  (revision 279322)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c  (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for the _Decimal64 type.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  4 } } */
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c  (revision 279322)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c  (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructio

[PATCH] V11 patch #13 of 15, Add test for -mcpu=future -fstack-protect-strong with large stacks

2019-12-20 Thread Michael Meissner

This is patch V8 #6.  It makes sure the stack protect insns work when
-mcpu=future and -fstack-protector-strong are used together.  We discovered
this failure when we attempted to build GLIBC using -mcpu=future.
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00089.html

This test now passes when I run it as part of the test suite, can I check it
in to the trunk?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/prefix-stack-protect.c: New test to make sure
-fstack-protect-strong works with prefixed addressing.

Index: gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (revision 
279324)
+++ gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */
+
+/* Test that we can handle large stack frames with -fstack-protector-strong and
+   prefixed addressing.  This was originally discovered in trying to build
+   glibc with -mcpu=future, and vfwprintf.c failed because it used
+   -fstack-protector-strong.  */
+
+extern long foo (char *);
+
+long
+bar (void)
+{
+  char buffer[0x2];
+  return foo (buffer) + 1;
+}
+
+/* { dg-final { scan-assembler {\mpld\M}  } } */
+/* { dg-final { scan-assembler {\mpstd\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #14 of 15, Add tests for vec_extract from memory with PC-relative addrss

2019-12-20 Thread Michael Meissner

These tests are new.  These tests check that the vector extract from a vector
in memory works correctly for both constant and variable element numbers.

These tests pass with all of the previoius pataches applied.  Can I check these
patches into the trunk?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/vec-extract-pcrel-si.c: New test for
vec_extract from a PC-relative address.
* gcc.target/powerpc/vec-extract-pcrel-di.c: New test for
vec_extract from a PC-relative address.
* gcc.target/powerpc/vec-extract-pcrel-sf.c: New test for
vec_extract from a PC-relative address.
* gcc.target/powerpc/vec-extract-pcrel-df.c: New test for
vec_extract from a PC-relative address.

Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (revision 
279615)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-df.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V2DF vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE double
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (revision 
279615)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-di.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V2DI vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned long
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (revision 
279615)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-sf.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V4SF vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE float
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mplfs\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (revision 
279615)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-pcrel-si.c (working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we can support vec_extract on V4SI vectors with a PC-relative
+   address.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned int
+#endif
+
+static vector TYPE v;
+vector TYPE *p = &v;
+
+TYPE
+get0 (void)
+{
+  return vec_extract (v, 0);
+}
+
+TYPE
+get1 (void)
+{
+  return vec_extract (v, 1);
+}
+
+TYPE
+getn (unsigned long n)
+{
+  return vec_extract (v, n);
+}
+
+/* { dg-final { scan-assembler-times {[@]pcrel}  3 } } */
+/* { dg-final { scan-assembler-times {\mplwz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpla\M}   1 } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V11 patch #15 of 15, Add tests for -mcpu=future vec_extract from memory with a large offset

2019-12-20 Thread Michael Meissner

These are new tests.  They verify if you are doing a vec_extract of a vector in
memory and the vector's address contains a large offset and the element number
is constant, it generates a prefixed load instruction when -mcpu=future.

Once all of the other V11 patches are checked in, can I check this patch into
the trunk?

2019-12-20  Michael Meissner  

* gcc.target/powerpc/vec-extract-large-si.c: New test for
vec_extract from a vector unsigned int in memory with a large
offset.
* gcc.target/powerpc/vec-extract-large-di.c: New test for
vec_extract from a vector long in memory with a large offset.
* gcc.target/powerpc/vec-extract-large-sf.c: New test for
vec_extract from a vector float in memory with a large offset.
* gcc.target/powerpc/vec-extract-large-df.c: New test for
vec_extract from a vector double in memory with a large offset.

Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (revision 
279691)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-df.c (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we generate prefixed loads for vec_extract of a vector double in
+   memory, and the memory address has a large offset.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE double
+#endif
+
+#ifndef LARGE
+#define LARGE 0x5
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 0);/* PLFD.  */
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 1);/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (revision 
279691)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-di.c (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we generate prefixed loads for vec_extract of a vector unsigned long
+   in memory, and the memory address has a large offset.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned long
+#endif
+
+#ifndef LARGE
+#define LARGE 0x5
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 0);/* PLD.  */
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 1);/* PLD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (revision 
279691)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-sf.c (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we generate prefixed loads for vec_extract of a vector float in
+   memory, and the memory address has a large offset.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE float
+#endif
+
+#ifndef LARGE
+#define LARGE 0x5
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 0);/* PLFS.  */
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 1);/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mplfs\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (revision 
279691)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-large-si.c (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test if we generate prefixed loads for vec_extract of a vector unsigned int
+   in memory, and the memory address has a large offset.  */
+
+#include 
+
+#ifndef TYPE
+#define TYPE unsigned int
+#endif
+
+#ifndef LARGE
+#define LARGE 0x5
+#endif
+
+TYPE
+get0 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 0);/* PLWZ.  */
+}
+
+TYPE
+get1 (vector TYPE *p)
+{
+  return vec_extract (p[LARGE], 1);/* PLWZ.  */
+}
+
+/* { dg-final { scan-assembler-times {\mplwz\M}  2 } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address

2020-01-06 Thread Michael Meissner

On Tue, Dec 24, 2019 at 10:24:55AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Dec 20, 2019 at 06:55:53PM -0500, Michael Meissner wrote:
> > * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper
> > function to identify the address mask of a hard register.
> 
> Do this as a separate patch please.  That refactoring is pre-approved.
> Please explain in the function comment what an "address mask" is.  Or
> better yet, don't call it a "mask", it isn't a mask?

It is called mask because everywhere else in rs6000.c uses 'addr_mask' or just
mask.  It is a mask of valid bits.

> Also various of the names here still have "reload" in it, which doesn't
> really make much sense.

When these functions were written, it was in the context of supporting the
secondary reload functions, and so reload was in the name.

I will make a refactoring patch that uses the current names.  If we want to
change all of the uses we can in a future patch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address

2020-01-06 Thread Michael Meissner

On Tue, Dec 24, 2019 at 10:24:55AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Dec 20, 2019 at 06:55:53PM -0500, Michael Meissner wrote:
> > * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper
> > function to identify the address mask of a hard register.
> 
> Do this as a separate patch please.  That refactoring is pre-approved.
> Please explain in the function comment what an "address mask" is.  Or
> better yet, don't call it a "mask", it isn't a mask?
> 
> Also various of the names here still have "reload" in it, which doesn't
> really make much sense.
> 
> rs6000_mode_to_addressing_flags?  And a reg_to for this new one?
> Something like that.

Note, rs6000_mode_to_addressing_flags also does not fit the usage.  The key is
to return the address mask of the valid addressing options that needs both a
hard register and a mode.  Mode by itself is not useful, since loading up
SImode to vector registers requires X_FORM, while then same mode in GPR
registers can of course do D_FORM and X_FORM addressing.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH, committed] V11 patch #2 of 15, Use prefixed load for vector extract with large offset

2020-01-06 Thread Michael Meissner

On Sun, Dec 22, 2019 at 11:10:09AM -0600, Segher Boessenkool wrote:
> The patch is okay for trunk (with the comment moved, and the rtx_equal_p
> fixed).  Thanks!

Here is the patch I committed (subversion id 279937):

2020-01-06  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
for the offset being 34-bits when -mcpu=future is used.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279910)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6797,11 +6797,19 @@ rs6000_adjust_vec_address (rtx scalar_re
  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
  rtx offset_rtx = GEN_INT (offset);
 
- if (IN_RANGE (offset, -32768, 32767)
+ /* 16-bit offset.  */
+ if (SIGNED_INTEGER_16BIT_P (offset)
  && (scalar_size < 8 || (offset & 0x3) == 0))
new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+ /* 34-bit offset if we have prefixed addresses.  */
+ else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (offset))
+   new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
  else
{
+ /* Offset overflowed, move offset to the temporary (which will
+likely be split), and do X-FORM addressing.  */
  emit_move_insn (base_tmp, offset_rtx);
  new_addr = gen_rtx_PLUS (Pmode, op0, base_tmp);
}
@@ -6830,6 +6838,12 @@ rs6000_adjust_vec_address (rtx scalar_re
  emit_insn (insn);
}
 
+ /* Make sure we don't overwrite the temporary if the element being
+extracted is variable, and we've put the offset into base_tmp
+previously.  */
+ else if (reg_mentioned_p (base_tmp, element_offset))
+   emit_insn (gen_add2_insn (base_tmp, op1));
+
  else
{
  emit_move_insn (base_tmp, op1);

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH, committed] V11 patch #3 of 15, Use 'Q' constraint for variable vector extract from memory

2020-01-06 Thread Michael Meissner

On Sun, Dec 22, 2019 at 11:24:51AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Dec 20, 2019 at 06:47:28PM -0500, Michael Meissner wrote:
> > Then I realized that eventaully we will want to generate an X-FORM 
> > (register +
> > register) address, and it was just simpler to use the 'Q' constraint, and 
> > have
> > the register allocator put the address into a register.
> 
> Yep, good call.
> 
> > * config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator):
> > Use 'Q' for memory constraints because we need to do an X-FORM
> > load with the variable index.
> > (vsx_extract_v4sf_var): Use 'Q' for memory constraints because we
> > need to do an X-FORM load with the variable index.
> 
> This comment is a headscratcher -- but you shouldn't say "why" in
> changelogs at all, so that is an easy fix ;-)
> 
> > (vsx_extract__var, VSX_EXTRACT_I iterator):Use 'Q' for
> 
> (missing space)
> 
> > memory constraints because we need to do an X-FORM load with the
> > variable index.
> > (vsx_extract__mode_var): Use 'Q' for memory
> > constraints because we need to do an X-FORM load with the variable
> > index.
> 
> (and more)
> 
> > -;; Variable V2DI/V2DF extract
> > +;; Variable V2DI/V2DF extract.  Use 'Q' for the memory because we will
> > +;; ultimately have to convert the address into base + index.
> 
> Maybe just don't write anything at all, since it is hard to explain in a
> few words?  It is clear that "Q" is not a usual constraint, anyway :-)
> 
> Okay for trunk like that.  Thanks!

This is the patch I committed (subversion id 279938):

2020-01-06  Michael Meissner  

* config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator):
Use 'Q' for doing vector extract from memory.
(vsx_extract_v4sf_var): Use 'Q' for doing vector extract from
memory.
(vsx_extract__var, VSX_EXTRACT_I iterator): Use 'Q' for
doing vector extract from memory.
(vsx_extract__mode_var): Use 'Q' for doing vector
extract from memory.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 279910)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -3248,7 +3248,7 @@ (define_insn "vsx_vslo_"
 ;; Variable V2DI/V2DF extract
 (define_insn_and_split "vsx_extract__var"
   [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r")
-   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m")
+   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
 (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
UNSPEC_VSX_EXTRACT))
(clobber (match_scratch:DI 3 "=r,&b,&b"))
@@ -3318,7 +3318,7 @@ (define_insn_and_split "*vsx_extract_v4s
 ;; Variable V4SF extract
 (define_insn_and_split "vsx_extract_v4sf_var"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r")
-   (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m")
+   (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q")
(match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
   UNSPEC_VSX_EXTRACT))
(clobber (match_scratch:DI 3 "=r,&b,&b"))
@@ -3681,7 +3681,7 @@ (define_insn_and_split "*vsx_extract__var"
   [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r")
(unspec:
-[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+[(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 UNSPEC_VSX_EXTRACT))
(clobber (match_scratch:DI 3 "=r,r,&b"))
@@ -3701,7 +3701,7 @@ (define_insn_and_split "*vsx_extract_ 0 "gpc_reg_operand" "=r,r,r")
(zero_extend:
 (unspec:
- [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+ [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
  UNSPEC_VSX_EXTRACT)))
(clobber (match_scratch:DI 3 "=r,r,&b"))

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH, committed] V11 patch #4 of 15, Update 'Q' constraint documentation.

2020-01-06 Thread Michael Meissner

On Sun, Dec 22, 2019 at 11:49:19AM -0600, Segher Boessenkool wrote:
> On Fri, Dec 20, 2019 at 06:49:30PM -0500, Michael Meissner wrote:
> > In doing V11 patch #3, I noticed that the documentation for the 'Q' was
> > misleading.
> 
> It originally was used just for lswi/stswi, which can access up to the
> first 32 bytes of storage pointed to by the register.  But yes, the
> current comment is confusing.
> 
> > * config/rs6000/constraints.md (Q constraint): Update
> > documentation.
> > * doc/md.tet (PowerPC constraints): Update 'Q' constraint
> > documentation.
> 
> "md.tet"?  That's an interesting typo :-)
> 
> >  (define_memory_constraint "Q"
> > -  "Memory operand that is an offset from a register (it is usually better
> > -to use @samp{m} or @samp{es} in @code{asm} statements)"
> > +  "A memory operand whose address which uses a single register with no 
> > offset."
> 
> Arm has
> 
> (define_memory_constraint "Q"
>  "@internal
>   An address that is a single base register."
>  (and (match_code "mem")
>   (match_test "REG_P (XEXP (op, 0))")))
> 
> which is more correct for us (the register cannot be r0!)
> 
> But it is not an address.
> 
> Maybe "A memory operand addressed by just a base register." ?
> 
> Okay for trunk like that.  Thanks!

This is the patch I committed (subversion ids 279939 and 279940).

2020-01-06  Michael Meissner  

* config/rs6000/constraints.md (Q constraint): Update
documentation.
* doc/md.texi (RS/6000 constraints): Update 'Q' cosntraint
documentation.

Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 279910)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -211,8 +211,7 @@ several times, or that might not access
(match_test "GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC")))
 
 (define_memory_constraint "Q"
-  "Memory operand that is an offset from a register (it is usually better
-to use @samp{m} or @samp{es} in @code{asm} statements)"
+  "A memory operand addressed by just a base register."
   (and (match_code "mem")
(match_test "REG_P (XEXP (op, 0))")))
 
Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi (revision 279910)
+++ gcc/doc/md.texi (working copy)
@@ -3381,8 +3381,7 @@ allowed when @samp{<} or @samp{>} is use
 as @samp{m} without @samp{<} and @samp{>}.
 
 @item Q
-Memory operand that is an offset from a register (it is usually better
-to use @samp{m} or @samp{es} in @code{asm} statements)
+A memory operand addressed by just a base register.
 
 @item Z
 Memory operand that is an indexed or indirect from a register (it is

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH, committed] V11 patch #5 of 15, Optimize vec_extract of a vector in memory with a PC-relative address

2020-01-06 Thread Michael Meissner

On Tue, Dec 24, 2019 at 10:24:55AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Dec 20, 2019 at 06:55:53PM -0500, Michael Meissner wrote:
> > * config/rs6000/rs6000.c (rs6000_reg_to_addr_mask): New helper
> > function to identify the address mask of a hard register.
> 
> Do this as a separate patch please.  That refactoring is pre-approved.
> Please explain in the function comment what an "address mask" is.  Or
> better yet, don't call it a "mask", it isn't a mask?

I committed this patch for the refactoring (subversion id 279941).  I will
submit the other pieces later.

2020-01-06  Michael Meissner  

* config/rs6000/rs6000.c (hard_reg_and_mode_to_addr_mask): New
helper function to return the valid addressing formats for a given
hard register and mode.
(rs6000_adjust_vec_address): Call hard_reg_and_mode_to_addr_mask.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 279912)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6729,6 +6729,30 @@ rs6000_expand_vector_extract (rtx target
 }
 }
 
+/* Helper function to return an address mask based on a physical register.  */
+
+static addr_mask_type
+hard_reg_and_mode_to_addr_mask (rtx reg, machine_mode mode)
+{
+  unsigned int r = reg_or_subregno (reg);
+  addr_mask_type addr_mask;
+
+  gcc_assert (HARD_REGISTER_NUM_P (r));
+  if (INT_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR];
+
+  else if (FP_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR];
+
+  else if (ALTIVEC_REGNO_P (r))
+addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX];
+
+  else
+gcc_unreachable ();
+
+  return addr_mask;
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -6865,21 +6889,8 @@ rs6000_adjust_vec_address (rtx scalar_re
   if (GET_CODE (new_addr) == PLUS)
 {
   rtx op1 = XEXP (new_addr, 1);
-  addr_mask_type addr_mask;
-  unsigned int scalar_regno = reg_or_subregno (scalar_reg);
-
-  gcc_assert (HARD_REGISTER_NUM_P (scalar_regno));
-  if (INT_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_GPR];
-
-  else if (FP_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_FPR];
-
-  else if (ALTIVEC_REGNO_P (scalar_regno))
-   addr_mask = reg_addr[scalar_mode].addr_mask[RELOAD_REG_VMX];
-
-  else
-   gcc_unreachable ();
+  addr_mask_type addr_mask
+   = hard_reg_and_mode_to_addr_mask (scalar_reg, scalar_mode);
 
   if (REG_P (op1) || SUBREG_P (op1))
valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH 0/3] Add support for -mcpu=power11

2024-03-19 Thread Michael Meissner

These three patches add support for -mcpu=power11 to the PowerPC GCC compiler.

There are 3 patches in the set.  I would like to check these patches into GCC
15 ASAP, and back port the patches into GCC 14 after GCC 14.1 ships.  I hope to
also back port these patches to other active branches after the code goes into
GCC 15 and then GCC 14.

Patch #1: This patch adds the basic support for power11.

*   This patch adds the -mcpu=power11.
*   This patch adds a power11 processor type.
*   This patch adds a bit to the isa_flags for power11 support.
*   This patch defines _ARCH_PWR11 if -mcpu=power11 is used.
*   This patch uses .machine power11 if -mcpu=power11 is used.
*   This patch passes -mpower11 or -mpwr11 to the assembler.
*   This patch uses the power10 defaults for power11.
*   This patch adds AUXV support for power11.

Patch #2: This patch adds tuning support for power11, treating power11 like
power10 at the current time.

Patch #3: This patch adds tests that are run if the assembler supports either
-mpower11 (under Linux) or -mpwr11 (under AIX).

These patches have been tested with bootstrap builds on a little endian power10
and a big endian power9 system.  When the GCC 15 tree opens up for general
patches, can I apply this patch?

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH 2/3] Add tuning support for -mcpu=power11

2024-03-19 Thread Michael Meissner

This patch makes -mtune=power11 use the same tuning decisions as
-mtune=power10.

I have tested this patch on a little endian power10 system and a big endian
power9 system.  There were no regressions.  Can I check this into GCC 15 when
it is open for general patches?

2024-03-18  Michael Meissner  

gcc/

* config/rs6000/power10.md (all reservations): Add power11 as an
alternative to power10.
---
 gcc/config/rs6000/power10.md | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index fcc2199ab29..90312643858 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -1,4 +1,4 @@
-;; Scheduling description for the IBM POWER10 processor.
+;; Scheduling description for the IBM POWER10 and POWER11 processors.
 ;; Copyright (C) 2020-2024 Free Software Foundation, Inc.
 ;;
 ;; Contributed by Pat Haugen (pthau...@us.ibm.com).
@@ -97,12 +97,12 @@ (define_insn_reservation "power10-load" 4
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-fused-load" 4
   (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-load" 4
@@ -110,13 +110,13 @@ (define_insn_reservation "power10-prefixed-load" 4
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-load-update" 4
   (and (eq_attr "type" "load")
(eq_attr "update" "yes")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-fpload-double" 4
@@ -124,7 +124,7 @@ (define_insn_reservation "power10-fpload-double" 4
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-fpload-double" 4
@@ -132,14 +132,14 @@ (define_insn_reservation "power10-prefixed-fpload-double" 
4
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-double" 4
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "64")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 ; SFmode loads are cracked and have additional 3 cycles over DFmode
@@ -148,27 +148,27 @@ (define_insn_reservation "power10-fpload-single" 7
   (and (eq_attr "type" "fpload")
(eq_attr "update" "no")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-single" 7
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-vecload" 4
   (and (eq_attr "type" "vecload")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10"))
+   (eq_attr "cpu" "power10,power11")

[PATCH 1/3] Add basic support for -mcpu=power11

2024-03-19 Thread Michael Meissner

This patch adds the power11 option to the -mcpu= and -mtune= switches.

This patch treats the power11 like a power10 in terms of costs and reassociation
width.

This patch issues a ".machine power11" to the assembly file if you use
-mcpu=power11.

This patch defines _ARCH_PWR11 if the user uses -mcpu=power11.

This patch allows GCC to be configured with the --with-cpu=power11 and
--with-tune=power11 options.

This patch passes -mpwr11 to the assembler if the user uses -mcpu=power11.

This patch adds support for using "power11" in the __builtin_cpu_is built-in
function.

I have tested this patch with a bootstrap build on a little endian power10
system and a bootstrap build on a big endian power9 system.  There were no
regressions.  Can I apply this patch when GCC 15 opens up for general patches?

2024-03-18  Michael Meissner  

gcc/

* config.gcc (rs6000*-*-*, powerpc*-*-*): Add support for power11.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define.
* config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_PWR11 if -mcpu=power11.
* config/rs6000/rs6000-cpus.def (ISA_POWER11_MASKS_SERVER): New define.
(POWERPC_MASKS): Add power11 isa bit.
(power11 cpu): Add power11 definition.
* config/rs6000/rs6000-opts.h (PROCESSOR_POWER11): Add power11 
processor.
* config/rs6000/rs6000-string.cc (expand_compare_loop): Likewise.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add power11
support.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Add power11.
* config/rs6000/rs6000.opt (-mpower11): Add internal power11 ISA flag.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11.
---
 gcc/config.gcc  |  6 --
 gcc/config/rs6000/aix71.h   |  1 +
 gcc/config/rs6000/aix72.h   |  1 +
 gcc/config/rs6000/aix73.h   |  1 +
 gcc/config/rs6000/driver-rs6000.cc  |  2 ++
 gcc/config/rs6000/ppc-auxv.h|  3 +--
 gcc/config/rs6000/rs6000-builtin.cc |  1 +
 gcc/config/rs6000/rs6000-c.cc   |  2 ++
 gcc/config/rs6000/rs6000-cpus.def   |  5 +
 gcc/config/rs6000/rs6000-opts.h |  3 ++-
 gcc/config/rs6000/rs6000-string.cc  |  1 +
 gcc/config/rs6000/rs6000-tables.opt |  3 +++
 gcc/config/rs6000/rs6000.cc | 32 +
 gcc/config/rs6000/rs6000.h  |  1 +
 gcc/config/rs6000/rs6000.md |  2 +-
 gcc/config/rs6000/rs6000.opt|  3 +++
 gcc/doc/invoke.texi |  5 +++--
 17 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 040afabd9ec..f8036b6476e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -531,7 +531,9 @@ powerpc*-*-*)
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
-   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower10|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
+   xpowerpc64 | xdefault64 | x6[23]0 | x970 | xG5 | xpower[3456789] \
+   | xpower1[01] | xpower6x | xrs64a | xcell | xa2 | xe500mc64 \
+   | xe5500 | xe6500)
cpu_is_64bit=yes
;;
esac
@@ -5566,7 +5568,7 @@ case "${target}" in
eval "with_$which=405"
;;
"" | common | native \
-   | power[3456789] | power10 | power5+ | power6x \
+   | power[3456789] | power1[01] | power5+ | power6x \
| powerpc | powerpc64 | powerpc64le \
| rs64 \
| 401 | 403 | 405 | 405fp | 440 | 440fp | 464 | 464fp \
diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 24bc301e37d..41037b3852d 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native

[PATCH 3/3] Add -mcpu=power11 tests

2024-03-19 Thread Michael Meissner

This patch adds some simple tests for -mcpu=power11 support.  In order to run
these tests, you need an assembler that supports the appropriate option for
supporting the Power11 processor (-mpower11 under Linux or -mpwr11 under AIX).

I have tested this patch on a little endian power10 system and a big endian
power9 system using the latest binutils which includes support for power11.
There were no regressions, and the 3 power11 tests added ran on both systems.
Can I check this patch into GCC 15 when it opens up for general patches?

2024-03-18  Michael Meissner  

gcc/testsuite/

* gcc.target/powerpc/power11-1.c: New test.
* gcc.target/powerpc/power11-2.c: Likewise.
* gcc.target/powerpc/power11-3.c: Likewise.
* lib/target-supports.exp (check_effective_target_power11_ok): Add new
effective target.
---
 gcc/testsuite/gcc.target/powerpc/power11-1.c | 13 +
 gcc/testsuite/gcc.target/powerpc/power11-2.c | 20 
 gcc/testsuite/gcc.target/powerpc/power11-3.c | 10 ++
 gcc/testsuite/lib/target-supports.exp| 17 +
 4 files changed, 60 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/power11-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/power11-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/power11-3.c

diff --git a/gcc/testsuite/gcc.target/powerpc/power11-1.c 
b/gcc/testsuite/gcc.target/powerpc/power11-1.c
new file mode 100644
index 000..6a2e802eedf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/power11-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-require-effective-target power11_ok } */
+/* { dg-options "-mdejagnu-cpu=power11 -O2" } */
+
+/* Basic check to see if the compiler supports -mcpu=power11.  */
+
+#ifndef _ARCH_PWR11
+#error "-mcpu=power11 is not supported"
+#endif
+
+void foo (void)
+{
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/power11-2.c 
b/gcc/testsuite/gcc.target/powerpc/power11-2.c
new file mode 100644
index 000..7b9904c1d29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/power11-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-require-effective-target power11_ok } */
+/* { dg-options "-O2" } */
+
+/* Check if we can set the power11 target via a target attribute.  */
+
+__attribute__((__target__("cpu=power9")))
+void foo_p9 (void)
+{
+}
+
+__attribute__((__target__("cpu=power10")))
+void foo_p10 (void)
+{
+}
+
+__attribute__((__target__("cpu=power11")))
+void foo_p11 (void)
+{
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/power11-3.c 
b/gcc/testsuite/gcc.target/powerpc/power11-3.c
new file mode 100644
index 000..9b2d643cc0f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/power11-3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target powerpc*-*-* } }  */
+/* { dg-require-effective-target power11_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2" }  */
+
+/* Check if we can set the power11 target via a target_clones attribute.  */
+
+__attribute__((__target_clones__("cpu=power11,cpu=power9,default")))
+void foo (void)
+{
+}
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 467b539b20d..be80494be80 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7104,6 +7104,23 @@ proc check_effective_target_power10_ok { } {
 }
 }
 
+# Return 1 if this is a PowerPC target supporting -mcpu=power11.
+
+proc check_effective_target_power11_ok { } {
+if { ([istarget powerpc*-*-*]) } {
+   return [check_no_compiler_messages power11_ok object {
+   int main (void) {
+   #ifndef _ARCH_PWR11
+   #error "-mcpu=power11 is not supported"
+   #endif
+   return 0;
+   }
+   } "-mcpu=power11"]
+} else {
+   return 0
+}
+}
+
 # Return 1 if this is a PowerPC target supporting -mfloat128 via either
 # software emulation on power7/power8 systems or hardware support on power9.
 
-- 
2.44.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2088 matches

Mail list logo