[PATCH] PR tree-optimization/101403: Incorrect folding of ((T)bswap(x))>>C

2021-07-11 Thread Roger Sayle

My sincere apologies for the breakage.  My recent patch to fold
bswapN(x)>>C where the constant C was large enough that the result
only contains bits from the low byte, and can therefore avoid
the byte swap contains a minor logic error.  The pattern contains
a convert? allowing an extension to occur between the bswap and
the shift.  The logic is correct if there's no extension, or the
extension has the same sign as the shift, but I'd mistakenly
convinced myself that these couldn't have different signedness.

(T)bswap16(x)>>12 is (T)((unsigned char)x>>4) or (T)((signed char)x>>4).
The bug is that for zero-extensions to signed type T, we need to use
the unsigned char variant [the signedness of the byte shift is not
(always) the same as the signedness of T and the original shift].

Then because I'm now paranoid, I've also added a clause to handle
the hypothetical (but in practice impossible) sign-extension to an
unsigned type T, which can implemented as (T)(x<<8)>>12.

This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures, and a new
testcase to confirm it fixes the regression.

Ok for mainline?

2021-07-11  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/101403
* gcc/match.pd ((T)bswap(X)>>C): Correctly handle cases where
signedness of the shift is not the same as the signedness of
the type extension.

gcc/testsuite/ChangeLog
PR tree-optimization/101403
* gcc.dg/pr101403.c: New test case.


Sorry again,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/match.pd b/gcc/match.pd
index 30680d4..beb8d27 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3659,19 +3659,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  {
   unsigned HOST_WIDE_INT prec = TYPE_PRECISION (TREE_TYPE (@2));
   unsigned HOST_WIDE_INT bits = tree_to_uhwi (@1);
+  /* If the bswap was extended before the original shift, this
+byte (shift) has the sign of the extension, not the sign of
+the original shift.  */
+  tree st = TYPE_PRECISION (type) > prec ? TREE_TYPE (@2) : type;
  }
- (if (bits + 8 == prec)
-  (if (TYPE_UNSIGNED (type))
-   (convert (convert:unsigned_char_type_node @0))
-   (convert (convert:signed_char_type_node @0)))
-  (if (bits < prec && bits + 8 > prec)
-   (with 
-   {
-tree nst = build_int_cst (integer_type_node, bits & 7);
-tree bt = TYPE_UNSIGNED (type) ? unsigned_char_type_node
-   : signed_char_type_node;
-   }
-   (convert (rshift:bt (convert:bt @0) {nst;}
+ /* Special case: logical right shift of sign-extended bswap.
+   (unsigned)(short)bswap16(x)>>12 is (unsigned)((short)x<<8)>>12. */
+ (if (TYPE_PRECISION (type) > prec
+ && !TYPE_UNSIGNED (TREE_TYPE (@2))
+ && TYPE_UNSIGNED (type)
+ && bits < prec && bits + 8 >= prec)
+  (with { tree nst = build_int_cst (integer_type_node, prec - 8); }
+   (rshift (convert (lshift:st (convert:st @0) {nst;})) @1))
+  (if (bits + 8 == prec)
+   (if (TYPE_UNSIGNED (st))
+   (convert (convert:unsigned_char_type_node @0))
+   (convert (convert:signed_char_type_node @0)))
+   (if (bits < prec && bits + 8 > prec)
+   (with 
+{
+ tree nst = build_int_cst (integer_type_node, bits & 7);
+ tree bt = TYPE_UNSIGNED (st) ? unsigned_char_type_node
+  : signed_char_type_node;
+}
+(convert (rshift:bt (convert:bt @0) {nst;})
  /* bswap(x) & C1 can sometimes be simplified to (x >> C2) & C1.  */
  (simplify
   (bit_and (convert? (bswap@2 @0)) INTEGER_CST@1)
/* { dg-do run } */
/* { dg-options "-O2" } */
unsigned int foo (unsigned int a)
{
  unsigned int u;
  unsigned short b = __builtin_bswap16 (a);
  return b >> (u, 12);
}

int main (void)
{
  unsigned int x = foo (0x80);
  if (x != 0x0008)
__builtin_abort ();
  return 0;
}



[PATCH V2] coroutines: Adjust outlined function names [PR95520].

2021-07-11 Thread Iain Sandoe
Hi Jason,

> On 9 Jul 2021, at 22:40, Jason Merrill  wrote:
> 
> On 7/9/21 2:18 PM, Iain Sandoe wrote:

> How about handling this in write_encoding, along the lines of the 
> devel/c++-contracts branch?

OK, so I took a look at this and implemented as below. 

 Some small differences from your contracts impl described here. 

recalling

the original function becomes the ramp - it is called directly by the user-code.
the resumer (actor) contains the outlined code wrapped in synthesized logic as 
dictated by the std
the destroy function effectively calls the actor with a flag that says “take 
the DTOR path” (since the DTOR path has to be available in the case of resume 
too).

this means that is is possible for the actor to be partially (or completely for 
a generator-style coro) inlined into either the ramp or the destroyer.

1. using DECL_ABSTRACT_ORIGIN didn’t work with optimisation and debug since the 
inlining of the outlining confuses the issue (the actor/destory helpers are not 
real clones).

 - there hasn’t been any specific reason to know “which” coroutine function was 
being lowered in the middle or back ends to date - so I had to add some 
book-keeping to allow that to be queried from write_encoding.

2. I had to cater for lambda coroutines; that meant recognising that we have a 
lambda coro helper and picking up the base mangling for the ramp (original 
lambda)

3. I made a minor adjustment to the string handling so that it can account for 
targets that don’t support ‘.’ or ‘$’ in symbols.

> Speaking of which, I wonder if you also want to do something similar to what 
> I did there to put the ramp/actor/destroyer functions into into the same 
> comdat group.

I looked through your code and agree that it should be possible to be more 
restrictive about the interfaces presented by the actor and destroy functions 
in coros.  The ramp obviously has to keep the visiblity with which the user 
wrote it.

As for comdat groups, I’d need to look harder.

please could these things be TODOs - the fix for 95520 doesn’t make them any 
worse (or better), and there are other bugs that are higher priority.

tested on x86_64-Linux,Darwin and powerpc64-linux, also with cppcoro (but i 
would plan to test it on folly too before pushing)

OK for master / backports?

=

The mechanism used to date for uniquing the coroutine helper
functions (actor, destroy) was over-complicating things and
leading to the noted PR and also difficulties in setting
breakpoints on these functions (so this will help PR99215 as
well).

This implementation delegates the adjustment to the mangling
to write_encoding() which necessitates some book-keeping so
that it is possible to determine which of the coroutine
helper names is to be mangled.

Signed-off-by: Iain Sandoe 

PR c++/95520 - [coroutines] __builtin_FUNCTION() returns mangled .actor instead 
of original function name

PR c++/95520

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Add fields for
actor and destroy function decls.
(to_ramp): New.
(coro_get_ramp_function): New.
(coro_get_actor_function): New.
(coro_get_destroy_function): New.
(act_des_fn): Set up mapping between ramp, actor and
destroy functions.
(morph_fn_to_coro): Adjust interface to the builder for
helper function decls.
* cp-tree.h (DECL_ACTOR_FN, DECL_DESTROY_FN, DECL_RAMP_FN
DECL_IS_CORO_ACTOR_P, DECL_IS_CORO_DESTROY_P JOIN_STR): New.
* mangle.c (write_encoding): Handle coroutine helpers.
(write_unqualified_name): Handle lambda coroutine helpers.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr95520.C: New test.
---
 gcc/cp/coroutines.cc  | 87 +++
 gcc/cp/cp-tree.h  | 30 
 gcc/cp/mangle.c   | 18 -
 gcc/testsuite/g++.dg/coroutines/pr95520.C | 29 
 4 files changed, 150 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95520.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 54ffdc8d062..a75f55427cb 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -82,11 +82,13 @@ static bool coro_promise_type_found_p (tree, location_t);
 struct GTY((for_user)) coroutine_info
 {
   tree function_decl; /* The original function decl.  */
-  tree promise_type; /* The cached promise type for this function.  */
-  tree handle_type;  /* The cached coroutine handle for this function.  */
-  tree self_h_proxy; /* A handle instance that is used as the proxy for the
-   one that will eventually be allocated in the coroutine
-   frame.  */
+  tree actor_decl;/* The synthesized actor function.  */
+  tree destroy_decl;  /* The synthesized destroy function.  */
+  tree promise_type;  /* The cached promise type for this function.  */
+  tree handle_type;   /* The cached coroutine handle for

[PATCH] x86: Don't enable UINTR in 32-bit mode

2021-07-11 Thread H.J. Lu via Gcc-patches
UINTR is available only in 64-bit mode.  Since the codegen target is
unknown when the the gcc driver is processing -march=native, to properly
handle UINTR for -march=native:

1. Add an undocumented option, -muintr-native.
2. Update the gcc driver to pass -muintr-native with -march=native if
UINTR is available.
3. Change ix86_option_override_internal to
   a. Turn on UINTR in 64-bit mode for -muintr-native.
   b. Enable UINTR only in 64-bit mode for -march=CPU when PTA_CPU
   includes PTA_UINTR.

gcc/

PR target/101395
* config/i386/driver-i386.c (host_detect_local_cpu): Add
-muintr-native for FEATURE_UINTR.
* config/i386/i386-options.c (ix86_option_override_internal):
Move -muintr check after -march processing and turn on UINTR in
64-bit mode for -muintr-native if it has not been disabled
explicitly.
(DEF_PTA): Skip PTA_UINTR if not in 64-bit mode.
* config/i386/i386.opt (muintr-native): New.  Undocumented to
enable -muintr only in 64-bit mode with -march=native.

gcc/testsuite/

PR target/101395
* gcc.target/i386/pr101395-1.c: New test.
* gcc.target/i386/pr101395-2.c: Likewise.
* gcc.target/i386/pr101395-3.c: Likewise.
---
 gcc/config/i386/driver-i386.c  | 10 --
 gcc/config/i386/i386-options.c | 14 +++---
 gcc/config/i386/i386.opt   |  4 
 gcc/testsuite/gcc.target/i386/pr101395-1.c | 12 
 gcc/testsuite/gcc.target/i386/pr101395-2.c | 22 ++
 gcc/testsuite/gcc.target/i386/pr101395-3.c |  6 ++
 6 files changed, 63 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-3.c

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index dd9236616b4..7ade90a088d 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -804,8 +804,14 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
if (isa_names_table[i].option)
  {
if (has_feature (isa_names_table[i].feature))
- options = concat (options, " ",
-   isa_names_table[i].option, NULL);
+ {
+   const char *option;
+   if (isa_names_table[i].feature == FEATURE_UINTR)
+ option = "-muintr-native";
+   else
+ option = isa_names_table[i].option;
+   options = concat (options, " ", option, NULL);
+ }
else
  options = concat (options, neg_option,
isa_names_table[i].option + 2, NULL);
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 7a35c468da3..6b12f030f0c 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -1920,9 +1920,6 @@ ix86_option_override_internal (bool main_args_p,
   opts->x_ix86_stringop_alg = no_stringop;
 }
 
-  if (TARGET_UINTR && !TARGET_64BIT)
-error ("%<-muintr%> not supported for 32-bit code");
-
   if (!opts->x_ix86_arch_string)
 opts->x_ix86_arch_string
   = TARGET_64BIT_P (opts->x_ix86_isa_flags)
@@ -2109,6 +2106,7 @@ ix86_option_override_internal (bool main_args_p,
 #define DEF_PTA(NAME) \
if (((processor_alias_table[i].flags & PTA_ ## NAME) != 0) \
&& PTA_ ## NAME != PTA_64BIT \
+   && (TARGET_64BIT || PTA_ ## NAME != PTA_UINTR) \
&& !TARGET_EXPLICIT_ ## NAME ## _P (opts)) \
  SET_TARGET_ ## NAME (opts);
 #include "i386-isa.def"
@@ -2184,6 +2182,16 @@ ix86_option_override_internal (bool main_args_p,
   XDELETEVEC (s);
 }
 
+  if (TARGET_64BIT)
+{
+  /* Turn on UINTR in 64-bit mode for -muintr-native if it has not
+been set explicitly.  */
+  if (ix86_uintr_native && !TARGET_EXPLICIT_UINTR_P (opts))
+   SET_TARGET_UINTR (opts);
+}
+  else if (TARGET_UINTR)
+error ("%<-muintr%> not supported for 32-bit code");
+
   ix86_arch_mask = HOST_WIDE_INT_1U << ix86_arch;
   for (i = 0; i < X86_ARCH_LAST; ++i)
 ix86_arch_features[i] = !!(initial_ix86_arch_features[i] & ix86_arch_mask);
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 7b8547bb1c3..5923bbfb40e 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -802,6 +802,10 @@ muintr
 Target Mask(ISA2_UINTR) Var(ix86_isa_flags2) Save
 Support UINTR built-in functions and code generation.
 
+; Used to enable -muintr only in 64-bit mode with -march=native.
+muintr-native
+Target Undocumented RejectNegative Var(ix86_uintr_native)
+
 msgx
 Target Mask(ISA2_SGX) Var(ix86_isa_flags2) Save
 Support SGX built-in functions and code generation.
diff --git a/gcc/testsuite/gcc.target/i386/pr101395-1.c 
b/gcc/testsuite/gcc.target/i386/pr101395-1.c
new file mode 1006

Re: [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
 * config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros): New.
---
  gcc/config/rs6000/smmintrin.h | 50 +++
  1 file changed, 50 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index bdf6eb365d88..1b8cad135ed0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
  }

+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

Line too long, please fix here and below.  (Existing cases can be left.)

+_mm_testz_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */


This is reasonable; thanks for documenting.

LGTM; I can't approve, but recommend approval with line lengths fixed.  
Thanks!

Bill


+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_testc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_testnzc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_all_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_all_ones (__m128i __A)
+{
+  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
+  return vec_all_eq ((__v16qu) __A, __ones);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
+  const int any_ones = vec_any_ne (__Amasked, __zero);
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
+  const int any_zeros = vec_any_ne (__notAmasked, __zero);
+  return any_ones * any_zeros;
+}
+
  #endif


Re: [PATCH 2/4] rs6000: Add tests for SSE4.1 "test" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

LGTM.  I can't approve, but recommend approval as is.

Thanks,
Bill

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

Copy the test for _mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  

gcc/testsuite/ChangeLog:
 * gcc.target/powerpc/sse4_1-ptest.c: Copy from
gcc/testsuite/gcc.target/i386.
---
  .../gcc.target/powerpc/sse4_1-ptest-1.c   | 117 ++
  1 file changed, 117 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
new file mode 100644
index ..69d13d57770d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
@@ -0,0 +1,117 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+static int
+make_ptestz (__m128i m, __m128i v)
+{
+  union
+{
+  __m128i x;
+  unsigned char c[16];
+} val, mask;
+  int i, z;
+
+  mask.x = m;
+  val.x = v;
+
+  z = 1;
+  for (i = 0; i < 16; i++)
+if ((mask.c[i] & val.c[i]))
+  {
+   z = 0;
+   break;
+  }
+  return z;
+}
+
+static int
+make_ptestc (__m128i m, __m128i v)
+{
+  union
+{
+  __m128i x;
+  unsigned char c[16];
+} val, mask;
+  int i, c;
+
+  mask.x = m;
+  val.x = v;
+
+  c = 1;
+  for (i = 0; i < 16; i++)
+if ((val.c[i] & ~mask.c[i]))
+  {
+   c = 0;
+   break;
+  }
+  return c;
+}
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x;
+  unsigned int i[4];
+} val[4];
+  int i, j, l;
+  int res[32];
+
+  val[0].i[0] = 0x;
+  val[0].i[1] = 0x;
+  val[0].i[2] = 0x;
+  val[0].i[3] = 0x;
+
+  val[1].i[0] = 0x;
+  val[1].i[1] = 0x;
+  val[1].i[2] = 0x;
+  val[1].i[3] = 0x;
+
+  val[2].i[0] = 0;
+  val[2].i[1] = 0;
+  val[2].i[2] = 0;
+  val[2].i[3] = 0;
+
+  val[3].i[0] = 0x;
+  val[3].i[1] = 0x;
+  val[3].i[2] = 0x;
+  val[3].i[3] = 0x;
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+for(j = 0; j < 4; j++)
+  {
+   res[l++] = _mm_testz_si128 (val[j].x, val[i].x);
+   res[l++] = _mm_testc_si128 (val[j].x, val[i].x);
+  }
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+for(j = 0; j < 4; j++)
+  {
+   if (res[l++] != make_ptestz (val[j].x, val[i].x))
+ abort ();
+   if (res[l++] != make_ptestc (val[j].x, val[i].x))
+ abort ();
+  }
+
+  if (res[2] != _mm_testz_si128 (val[1].x, val[0].x))
+abort ();
+
+  if (res[3] != _mm_testc_si128 (val[1].x, val[0].x))
+abort ();
+}


Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.
---
  gcc/config/rs6000/smmintrin.h | 46 +++
  1 file changed, 46 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 1b8cad135ed0..fa17a8b2f478 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
  }

+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

Usual line length complaint. :)  Here and below...

+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  const signed char __tmp = (__imm8 & 0b10) * 0b0000 |
+   (__imm8 & 0b01) * 0b;
+  __v16qi __charmask = vec_splats ((signed char) __tmp);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __shortmask);


This seems way too complex, and needs commentary to explain what you're 
doing.  Doesn't this instruction just translate into some form of 
xxpermdi?  Different ones for BE and LE, but still just xxpermdi, I think.



+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+}


Okay.


+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
+(__imm8 & 0b0100) * 0b1100 |
+(__imm8 & 0b0010) * 0b0110 |
+(__imm8 & 0b0001) * 0b0011;
+  __v16qi __charmask = vec_splats ( __mask);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);


This is a good trick, but you need comments to explain what you're 
doing, including how you build __mask.  I recommend you include 
alternate code for P10, where you can just use vec_genwm to expand from 
__mask to a mask of word elements.


I don't understand how you're getting away with a v8hu mask for word 
elements.  This seems wrong to me.  Adequate testing?



+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+}
+


Okay.

Please have a look at the above issues and resubmit.  Thanks!
Bill


  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  _mm_testz_si128 (__m128i __A, __m128i __B)
  {


Re: [PATCH 4/4] rs6000: Add tests for SSE4.1 "blend" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

Please resubmit this when you resubmit 3/4, in case any adjustments are 
needed.


Thanks!
Bill

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  

gcc/testsuite/ChangeLog:
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c: Copy
from gcc/testsuite/gcc.target/i386.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
---
  .../gcc.target/powerpc/sse4_1-blendpd.c   | 89 ++
  .../gcc.target/powerpc/sse4_1-blendps-2.c | 81 +
  .../gcc.target/powerpc/sse4_1-blendps.c   | 90 +++
  .../gcc.target/powerpc/sse4_1-blendvpd.c  | 65 ++
  4 files changed, 325 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
new file mode 100644
index ..ca1780471fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+#include 
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x03
+#endif
+
+static void
+init_blendpd (double *src1, double *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 2; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendpd (__m128d *dst, double *src1, double *src2)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+
+  for(j = 0; j < 2; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128d x, y;
+  union
+{
+  __m128d x[NUM];
+  double d[NUM * 2];
+} dst, src1, src2;
+  union
+{
+  __m128d x;
+  double d[2];
+} src3;
+  int i;
+
+  init_blendpd (src1.d, src2.d);
+
+  /* Check blendpd imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
+  if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
+   abort ();
+}
+
+  /* Check blendpd imm8, xmm, xmm */
+  src3.x = _mm_setzero_pd ();
+
+  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
+  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
+
+  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
+abort ();
+
+  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
new file mode 100644
index ..768b6e64bbae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+#include 
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xe
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  __m128 x, y;
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2;
+  union
+{
+  __m128 x;
+  float f[4];
+} src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK);
+  if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+   abort ();
+}
+
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc

Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

On 7/11/21 11:17 AM, Bill Schmidt wrote:

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.
---
  gcc/config/rs6000/smmintrin.h | 46 +++
  1 file changed, 46 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h 
b/gcc/config/rs6000/smmintrin.h

index 1b8cad135ed0..fa17a8b2f478 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, 
__m128i __mask)

    return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
  }

+extern __inline __m128d __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

Usual line length complaint. :)  Here and below...

+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  const signed char __tmp = (__imm8 & 0b10) * 0b0000 |
+    (__imm8 & 0b01) * 0b;
+  __v16qi __charmask = vec_splats ((signed char) __tmp);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
__shortmask);


This seems way too complex, and needs commentary to explain what 
you're doing.  Doesn't this instruction just translate into some form 
of xxpermdi?  Different ones for BE and LE, but still just xxpermdi, I 
think.



+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const vector __bool long long __boolmask = vec_cmplt ((__v2di) 
__mask, __zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
__boolmask);

+}


Okay.


+
+extern __inline __m128 __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
+ (__imm8 & 0b0100) * 0b1100 |
+ (__imm8 & 0b0010) * 0b0110 |
+ (__imm8 & 0b0001) * 0b0011;
+  __v16qi __charmask = vec_splats ( __mask);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);


This is a good trick, but you need comments to explain what you're 
doing, including how you build __mask.  I recommend you include 
alternate code for P10, where you can just use vec_genwm to expand 
from __mask to a mask of word elements.


I don't understand how you're getting away with a v8hu mask for word 
elements.  This seems wrong to me.  Adequate testing?


As an alternate approach, I suppose you could use vec_perm / vec_permr 
with one of sixteen possible masks, which would seem faster than the 
splat/gather/unpack/select approach.  Something to consider.


Bill




+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, 
__zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) 
__boolmask);

+}
+


Okay.

Please have a look at the above issues and resubmit.  Thanks!
Bill

  extern __inline int __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

  _mm_testz_si128 (__m128i __A, __m128i __B)
  {


Re: [committed] input.c: move file caching globals to a new file_cache class

2021-07-11 Thread Lewis Hyatt via Gcc-patches
Hi David-

I thought this might be a good opportunity to ask about the patch that
supports -finput-charset in diagnostic.c please?
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564527.html

The patch will require some work to adapt to the new changes below. I
am happy to do that, but thought I should check first whether you have
any interest in this approach? Thanks!

-Lewis

On Thu, Jul 1, 2021 at 5:52 PM David Malcolm via Gcc-patches
 wrote:
>
> This moves some global state from input.c to a new file_cache class,
> of which an instance is owned by global_dc.  Various state is also
> made private.
>
> No functional change intended.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Pushed to trunk as b544c348e13ad33d55f0d954370ab1fb0f2bf683.
>
> gcc/ChangeLog:
> * diagnostic.h (diagnostic_context::m_file_cache): New field.
> * input.c (class fcache): Rename to...
> (class file_cache_slot): ...this, making most members private and
> prefixing fields with "m_".
> (file_cache_slot::get_file_path): New accessor.
> (file_cache_slot::get_use_count): New accessor.
> (file_cache_slot::missing_trailing_newline_p): New accessor.
> (file_cache_slot::inc_use_count): New.
> (fcache_buffer_size): Move to...
> (file_cache_slot::buffer_size): ...here.
> (fcache_line_record_size): Move to...
> (file_cache_slot::line_record_size): ...here.
> (fcache_tab): Delete, in favor of global_dc->m_file_cache.
> (fcache_tab_size): Move to file_cache::num_file_slots.
> (diagnostic_file_cache_init): Update for move of fcache_tab
> to global_dc->m_file_cache.
> (diagnostic_file_cache_fini): Likewise.
> (lookup_file_in_cache_tab): Convert to...
> (file_cache::lookup_file): ...this.
> (diagnostics_file_cache_forcibly_evict_file): Update for move of
> fcache_tab to global_dc->m_file_cache, moving most of
> implementation to...
> (file_cache::forcibly_evict_file): ...this new function and...
> (file_cache_slot::evict): ...this new function.
> (evicted_cache_tab_entry): Convert to...
> (file_cache::evicted_cache_tab_entry): ...this.
> (add_file_to_cache_tab): Convert to...
> (file_cache::add_file): ...this, moving bulk of implementation
> to...
> (file_cache_slot::create): ..this new function.
> (file_cache::file_cache): New.
> (file_cache::~file_cache): New.
> (lookup_or_add_file_to_cache_tab): Convert to...
> (file_cache::lookup_or_add_file): ..this new function.
> (fcache::fcache): Rename to...
> (file_cache_slot::file_cache_slot): ...this, adding "m_" prefixes
> to fields.
> (fcache::~fcache): Rename to...
> (file_cache_slot::~file_cache_slot): ...this, adding "m_" prefixes
> to fields.
> (needs_read): Convert to...
> (file_cache_slot::needs_read_p): ...this.
> (needs_grow): Convert to...
> (file_cache_slot::needs_grow_p): ...this.
> (maybe_grow): Convert to...
> (file_cache_slot::maybe_grow): ...this.
> (read_data): Convert to...
> (file_cache_slot::read_data): ...this.
> (maybe_read_data): Convert to...
> (file_cache_slot::maybe_read_data): ...this.
> (get_next_line): Convert to...
> (file_cache_slot::get_next_line): ...this.
> (goto_next_line): Convert to...
> (file_cache_slot::goto_next_line): ...this.
> (read_line_num): Convert to...
> (file_cache_slot::read_line_num): ...this.
> (location_get_source_line): Update for moving of globals to
> global_dc->m_file_cache.
> (location_missing_trailing_newline): Likewise.
> * input.h (class file_cache_slot): New forward decl.
> (class file_cache): New.
>
> Signed-off-by: David Malcolm 
> ---
>  gcc/diagnostic.h |   3 +
>  gcc/input.c  | 459 +++
>  gcc/input.h  |  33 
>  3 files changed, 301 insertions(+), 194 deletions(-)
>
> diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
> index 1b9d6b1f64d..086bc4f903f 100644
> --- a/gcc/diagnostic.h
> +++ b/gcc/diagnostic.h
> @@ -136,6 +136,9 @@ struct diagnostic_context
>/* Where most of the diagnostic formatting work is done.  */
>pretty_printer *printer;
>
> +  /* Cache of source code.  */
> +  file_cache *m_file_cache;
> +
>/* The number of times we have issued diagnostics.  */
>int diagnostic_count[DK_LAST_DIAGNOSTIC_KIND];
>
> diff --git a/gcc/input.c b/gcc/input.c
> index 9e39e7df83c..de20d983d2c 100644
> --- a/gcc/input.c
> +++ b/gcc/input.c
> @@ -32,9 +32,29 @@ along with GCC; see the file COPYING3.  If not see
>
>  /* This is a cache used by get_next_line to store the content of a
> file to be searched for file lines.  */
> -class fcache
> +class file_cache_slot
>

Re: [PATCH 1/2] rs6000: Add support for SSE4.1 "ceil" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/1/21 9:11 PM, Paul A. Clarke via Gcc-patches wrote:

2021-07-01  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
_mm_ceil_sd, _mm_ceil_ss): New.
---
  gcc/config/rs6000/smmintrin.h | 28 
  1 file changed, 28 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fa17a8b2f478..0c0b0dd7c1e3 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -212,4 +212,32 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
return any_ones * any_zeros;
  }

+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))


Usual fuss over line length, here and below.  Otherwise LGTM, can't 
approve but recommend approval with those changes.


Thanks,
Bill


+_mm_ceil_pd (__m128d __A)
+{
+  return (__m128d) vec_ceil ((__v2df) __A);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_ps (__m128 __A)
+{
+  return (__m128) vec_ceil ((__v4sf) __A);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_ceil ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_ceil (((__v4sf) __B)[0]);
+  return r;
+}
+
  #endif


Re: [PATCH 2/2] rs6000: Add tests for SSE4.1 "ceil" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/1/21 9:11 PM, Paul A. Clarke via Gcc-patches wrote:

Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitons in

typo ("definitions").

m128-check.h.

2021-07-01  Paul A. Clarke  

gcc/testsuite/ChangeLog:

"gcc/testsuite/" will make the tools happy.

* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c: New.


All of these should be relative to gcc/testsuite/, so

    * gcc.target/powerpc/sse4_1-ceilpd.c: New.

and similar.


* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c: Copy
 from gcc/testsuite/gcc.target/i386.
* gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
(__VSX_SSE2__): Define.
---
  .../gcc.target/powerpc/sse4_1-ceilpd.c|  51 
  .../gcc.target/powerpc/sse4_1-ceilps.c|  33 +
  .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 ++
  .../gcc.target/powerpc/sse4_1-ceilss.c|  95 ++
  .../gcc.target/powerpc/sse4_1-check.h |   4 +
  .../gcc.target/powerpc/sse4_1-round-data.h|  20 +++
  .../gcc.target/powerpc/sse4_1-round.h |  27 
  .../gcc.target/powerpc/sse4_1-round2.h|  27 
  .../gcc.target/powerpc/sse4_1-roundpd-3.c |  36 ++
  9 files changed, 412 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
new file mode 100644
index ..f532fdb9c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  1.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  1.0,  1.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.ep+50,  0x1.fp+50 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.2p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.4p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.0p+52 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+51 } },
+   { -0x1.ep+51, -0x1.ep+51 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.2p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0,  0.0 } },
+  { { .f = { -0.50, -0.25 } }, {  0.0,  0.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
new file mode 100644
index ..417ac76d8aa9
--- /dev/null

Re: [PATCH 1/2] rs6000: Add support for SSE4.1 "floor" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/6/21 5:50 PM, Paul A. Clarke via Gcc-patches wrote:

2021-07-06  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
_mm_floor_sd, _mm_floor_ss): New.
---
  gcc/config/rs6000/smmintrin.h | 28 
  1 file changed, 28 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 0c0b0dd7c1e3..f484a7fd029f 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -240,4 +240,32 @@ _mm_ceil_ss (__m128 __A, __m128 __B)
return r;
  }

+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))


Usual fuss about line length.  LGTM with that fixed here and below.

I can't approve, but recommend approval with those changes.

Thanks,
Bill


+_mm_floor_pd (__m128d __A)
+{
+  return (__m128d) vec_floor ((__v2df) __A);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_floor_ps (__m128 __A)
+{
+  return (__m128) vec_floor ((__v4sf) __A);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_floor_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_floor ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_floor_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_floor (((__v4sf) __B)[0]);
+  return r;
+}
+
  #endif


Re: [PATCH 2/2] rs6000: Add tests for SSE4.1 "floor" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/6/21 5:50 PM, Paul A. Clarke via Gcc-patches wrote:

Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-06  Paul A. Clarke  

gcc/testsuite/ChangeLog:
[Applies to all your patches] Don't need to include "ChangeLog:" here.  
"gcc/testsuite/" works fine.

* gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c: Copy
from gcc/testsuite/gcc.target/i386.
As before, these all need to be relative to gcc/testsuite, so start with 
gcc.target/...

---
  .../gcc.target/powerpc/sse4_1-floorpd.c   |  51 
  .../gcc.target/powerpc/sse4_1-floorps.c   |  33 +
  .../gcc.target/powerpc/sse4_1-floorsd.c   | 119 ++
  .../gcc.target/powerpc/sse4_1-floorss.c   |  95 ++
  .../gcc.target/powerpc/sse4_1-roundpd-2.c |  36 ++
  5 files changed, 334 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
new file mode 100644
index ..ad21644f50c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_floor_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  0.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  0.0,  0.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.cp+50 } },
+  { { .f = {  0x1.ep+50,  0x1.0p+51 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.2p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.ep+51 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+52 } },
+   { -0x1.0p+52, -0x1.ep+52 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.4p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.2p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.0p+51, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } },
+  { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
new file mode 100644
index ..17ff35a7360f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_floor_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  0.0,  0.0,  0.0 } },
+
+  { { .f = {  0x1.f8p+21,  0x1.fap+21,  0x1.fcp+21,  
0x1.fep+21 } },
+   {  0x1.f8p+21,  0x1.f8p+21,  0x1.f8p+21,  
0x1.f8p+21 } },
+
+  { { .f = {  0x1.fap+22,  

Re: Repost: [PATCH] PR 100168: Fix call test on power10.

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Mike,

LGTM.  I can't approve, but recommend approval.

Thanks,
Bill

On 7/7/21 3:08 PM, Michael Meissner wrote:

[PATCH] PR 100168: Fix call test on power10.

Fix a test that was checking for 64-bit TOC calls, to also allow for
PC-relative calls.

I have verified that this test passes when run on a power10 system configured
with --with-cpu=power10 and it continues to pass on power9 little endian and
power8 big endian systems.

Can I check this into the master branch?

2021-07-07  Michael Meissner  

gcc/testsuite
PR testsuite/100168
* gcc.dg/pr56727-2.c: Add support for PC-relative calls.
---
  gcc/testsuite/gcc.dg/pr56727-2.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c b/gcc/testsuite/gcc.dg/pr56727-2.c
index c54369ed25e..77fdf4bc350 100644
--- a/gcc/testsuite/gcc.dg/pr56727-2.c
+++ b/gcc/testsuite/gcc.dg/pr56727-2.c
@@ -18,4 +18,4 @@ void h ()
  
  /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } } } */

  /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* && 
ilp32 } } } } */
-/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { powerpc*-*-linux* && 
lp64 } } } } */
+/* { dg-final { scan-assembler "(bl f\n\\s*nop)|(bl f@notoc)" { target { 
powerpc*-*-linux* && lp64 } } } } */


Re: Repost: [PATCH] PR 100170: Fix eq/ne tests on power10.

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Mike,

ENOPATCH

Thanks,
Bill :-)

On 7/7/21 3:06 PM, Michael Meissner wrote:

[PATCH] PR 100170: Fix eq/ne tests on power10.

This patch updates eq/ne tests in the testsuite to adjust the test if
power10 code generation is used.

I have verified that these tests run on a power10 system using the
--with-cpu=power10 configuration option, and they continue to run on power9
little endian and power8 big endian systems.

Can I check this patch into th master branch?

2021-07-07  Michael Meissner  

gcc/testsuite/
PR testsuite/100170
* gcc.target/powerpc/ppc-eq0-1.c: Add support for the setbc
instruction.
* gcc.target/powerpc/ppc-ne0-1.c: Update instruction counts on
power10.



Re: Repost: [PATCH] PR 100167: Fix vector long long multiply/divide tests on power10

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Mike,

On 7/7/21 3:04 PM, Michael Meissner wrote:

[PATCH] PR 100167: Fix vector long long multiply/divide tests on power10.

This patch updates the vector long long multiply and divide tests to
supply the correct code information if power10 code generation is used.

2021-07-07  Michael Meissner  

gcc/testsuite/
PR testsuite/100167
* gcc.target/powerpc/fold-vec-div-longlong.c:

Missing information after colon.

* gcc.target/powerpc/fold-vec-mult-longlong.c: Fix expected code
generation on power10.
---
  gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c  | 7 +--
  gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c | 3 ++-
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
index 312e984d3cc..f6a9b290ae5 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
@@ -19,5 +19,8 @@ test6 (vector unsigned long long x, vector unsigned long long 
y)
  {
return vec_div (x, y);
  }
-/* { dg-final { scan-assembler-times {\mdivd\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mdivdu\M} 2 } } */
+
+/* { dg-final { scan-assembler-times {\mdivd\M}   2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mdivdu\M}  2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 { target {   
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 { target {   
has_arch_pwr10 } } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
index 38dba9f5023..bd210e34801 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
@@ -20,5 +20,6 @@ test6 (vector unsigned long long x, vector unsigned long long 
y)
return vec_mul (x, y);
  }
  
-/* { dg-final { scan-assembler-times "\[ \t\]mulld " 4 { target lp64 } } } */

+/* { dg-final { scan-assembler-times {\mmulld\M}  4 { target { lp64 && { ! 
has_arch_pwr10 } } } } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 { target { has_arch_pwr10  
   } } } } */
  


Shouldn't this last be { lp64 && has_arch_pwr10 } ?

Otherwise LGTM.  I can't approve, but recommend approval with those changes.

Thanks,
Bill



Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-07-11 Thread Xionghu Luo via Gcc-patches



On 2021/7/10 02:40, will schmidt wrote:
> On Wed, 2021-06-30 at 09:44 +0800, Xionghu Luo via Gcc-patches wrote:
>> Gentle ping ^2, thanks.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html
>>
>>
>> On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote:
>>> Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%,
>>> 526.blender_r +1.72%, no obvious changes to others.
> 
> Ok.
> 
>>>
>>>
>>> On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote:
 Gentle ping, thanks.


 On 2021/4/16 15:10, Xiong Hu Luo wrote:
> fmod/fmodf and remainder/remainderf could be expanded instead of library
> call when fast-math build, which is much faster.
>
> fmodf:
>fdivs   f0,f1,f2
>frizf0,f0
>fnmsubs f1,f2,f0,f1
>
> remainderf:
>fdivs   f0,f1,f2
>frinf0,f0
>fnmsubs f1,f2,f0,f1
>
> gcc/ChangeLog:
>
> 2021-04-16  Xionghu Luo  
>
>  PR target/97142
> 
> That PR is " Bug 97142
>- __builtin_fmod not optimized on POWER   "
> 
> OK.
> 
> 
>  * config/rs6000/rs6000.md (fmod3): New define_expand.
>  (remainder3): Likewise.
> 
> 
>
> gcc/testsuite/ChangeLog:
>
> 2021-04-16  Xionghu Luo  
>
>  PR target/97142
>  * gcc.target/powerpc/pr97142.c: New test.
> 
> Ok.
> 
> ---
>gcc/config/rs6000/rs6000.md| 36 ++
>gcc/testsuite/gcc.target/powerpc/pr97142.c | 30 ++
>2 files changed, 66 insertions(+)
>create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97142.c
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a1315523fec..7e0e94e6ba4 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4902,6 +4902,42 @@ (define_insn "fre"
>  [(set_attr "type" "fp")
>   (set_attr "isa" "*,")])
> +(define_expand "fmod3"
> +  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
> +(use (match_operand:SFDF 1 "gpc_reg_operand"))
> +(use (match_operand:SFDF 2 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +  && TARGET_FPRND
> +  && flag_unsafe_math_optimizations"
> +{
> +  rtx div = gen_reg_rtx (mode);
> +  emit_insn (gen_div3 (div, operands[1], operands[2]));
> +
> +  rtx friz = gen_reg_rtx (mode);
> +  emit_insn (gen_btrunc2 (friz, div));
> +
> +  emit_insn (gen_nfms4 (operands[0], operands[2], friz,
> operands[1]));
> +  DONE;
> + })
> +
> +(define_expand "remainder3"
> +  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
> +(use (match_operand:SFDF 1 "gpc_reg_operand"))
> +(use (match_operand:SFDF 2 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +  && TARGET_FPRND
> +  && flag_unsafe_math_optimizations"
> +{
> +  rtx div = gen_reg_rtx (mode);
> +  emit_insn (gen_div3 (div, operands[1], operands[2]));
> +
> +  rtx frin = gen_reg_rtx (mode);
> +  emit_insn (gen_round2 (frin, div));
> +
> +  emit_insn (gen_nfms4 (operands[0], operands[2], frin,
> operands[1]));
> +  DONE;
> + })
> 
> I notice the pattern of arguments to the final emit
> is op[0],op[2],fri*,op[1]
> while the description comment suggests the generated instruction
> will be fnmsubs  f1,f2,f0,f1  ;
> 
> I don't see any rearranging in the nfms4 expansions, but
> presumably this is correct and just a cosmetic nit that catches my eye.


>From the ISA, 

fnmsub FRT,FRA,FRC,FRB

The operation
FRT ← - ( [(FRA) (FRC)] - (FRB) )
is performed.

 fmodf:
   fdivs   f0,f1,f2
   frizf0,f0
   fnmsubs f1,f2,f0,f1

Then the ASM means:

f1 = - (f2 * f0 - f1) = - ([f2 * f1/f2] - f1)

So f1 is set with the mod result.

> 
> Ok.
> 
> 
> +
>(define_insn "*rsqrt2"
>  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa")
>(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" ",wa")]
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97142.c
> b/gcc/testsuite/gcc.target/powerpc/pr97142.c
> new file mode 100644
> index 000..48f25ca5b5b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97142.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast" } */
> +
> +#include 
> +
> +float test1 (float x, float y)
> +{
> +  return fmodf (x, y);
> +}
> +
> +double test2 (double x, double y)
> +{
> +  return fmod (x, y);
> +}
> +
> +float test3 (float x, float y)
> +{
> +  return remainderf (x, y);
> +}
> +
> +double test4 (double x, double y)
> +{
> +  return remainder (x, y);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mbl fmod\M} } } */
> +/* { dg-final { scan-assembler-not {\mbl fmodf\M} } } */
> +/* { dg-final { scan-assembler-not {\

Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-07-11 Thread Martin Liška

PING^1

On 6/28/21 2:19 PM, Martin Liška wrote:

On 6/24/21 12:46 AM, Segher Boessenkool wrote:

Hi!

On Wed, Jun 23, 2021 at 03:22:34PM +0200, Martin Liška wrote:

As mentioned in the "Fallout: save/restore target options in
handle_optimize_attribute"
thread, we need to support target option restore of
rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.


I have no idea?  Could you explain please?


Sure. Few weeks ago, we started using cl_target_option_{save,restore} calls
even for optimize attributes (and pragma). Motivation was that optimize options
can influence target options (and vice versa).

Doing that, FLOAT_PRECISION_TFmode must be accepted as a valid option value
for rs6000_long_double_type_size.




--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p)
    else
  rs6000_long_double_type_size = default_long_double_size;
  }
+  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
+    ; /* The option can be restored a TREE_TARGET_OPTION.  */


What does that mean?  It is not grammatical, and not obvious what it
should mean.


Updated.




--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-linux* } } } */


Why on Linux only?  That doesn't sound right.  Do you need some other
selector(s)?


Sorry, I copied the test-case.




+/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
+
+extern unsigned long int x;
+extern float f (float);
+extern __typeof (f) f_power8;
+extern __typeof (f) f_power9;
+extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) *


-fno-stack-protector is default.


Yes, but one needs an optimize attribute in order to trigger 
cl_target_option_save/restore
mechanism.

Martin




+f_ifunc (void)
+{
+  __typeof (f) *res = x ? f_power9 : f_power8;
+  return res;
+}


The testcase should say what it is testing for, it is not obvious?


Segher







[PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-11 Thread Kito Cheng
It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* config/riscv/constraints.md ("S"): Update description and remove
@internal.
* doc/md.texi (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
 gcc/config/riscv/constraints.md | 3 +--
 gcc/doc/md.texi | 3 +++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 8c15c6c0486..c87d5b796a5 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -67,8 +67,7 @@ (define_memory_constraint "A"
(match_test "GET_CODE(XEXP(op,0)) == REG")))
 
 (define_constraint "S"
-  "@internal
-   A constant call address."
+  "A constraint that matches an absolute symbolic address."
   (match_operand 0 "absolute_symbolic_operand"))
 
 (define_constraint "U"
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..2d120da96cf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
 @item A
 An address that is held in a general-purpose register.
 
+@item S
+A constraint that matches an absolute symbolic address.
+
 @end table
 
 @item RX---@file{config/rx/constraints.md}
-- 
2.31.1



Re: [PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-11 Thread Fangrui Song

On 2021-07-12, Kito Cheng wrote:

It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* config/riscv/constraints.md ("S"): Update description and remove
@internal.
* doc/md.texi (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
gcc/config/riscv/constraints.md | 3 +--
gcc/doc/md.texi | 3 +++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 8c15c6c0486..c87d5b796a5 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -67,8 +67,7 @@ (define_memory_constraint "A"
   (match_test "GET_CODE(XEXP(op,0)) == REG")))

(define_constraint "S"
-  "@internal
-   A constant call address."
+  "A constraint that matches an absolute symbolic address."
  (match_operand 0 "absolute_symbolic_operand"))

(define_constraint "U"
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..2d120da96cf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
@item A
An address that is held in a general-purpose register.

+@item S
+A constraint that matches an absolute symbolic address.
+
@end table

@item RX---@file{config/rx/constraints.md}
--
2.31.1


LGTM


Re: [PATCH v3 1/2] Add -f[no-]direct-extern-access

2021-07-11 Thread Richard Biener via Gcc-patches
On Fri, Jul 9, 2021 at 4:50 PM H.J. Lu  wrote:
>
> -fdirect-extern-access is the default.  With -fno-direct-extern-access:
>
> 1. Always use GOT to access undefined data and function symbols,
>including in PIE and non-PIE.  These will avoid copy relocations
>in executables.  This is compatible with existing executables and
>shared libraries.
> 2. In executable and shared library, bind symbols with the STV_PROTECTED
>visibility locally:
>a. The address of data symbol is the address of data body.
>b. For systems without function descriptor, the function pointer is
>   the address of function body.
>c. The resulting shared libraries may not be incompatible with
>   executables which have copy relocations on protected symbols or
>   use executable PLT entries as function addresses for protected
>   functions in shared libraries.
> 3. Update asm_preferred_eh_data_format to select PC relative EH encoding
> format with -fno-direct-extern-access to avoid copy relocation.
> 4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy
> relocation with -fno-direct-extern-access.

Did you check how relocations in .debug_info behave?  I don't remember whether
we're doing anything special there or if we just copy how we emit
relocs in .text

Richard.

> gcc/
>
> PR target/35513
> PR target/100593
> * common.opt: Add -fdirect-extern-access.
> * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a
> bool argument.
> * config/i386/i386.c (ix86_force_load_from_GOT_p): Add a bool
> argument to indicate call operand.  Force non-call load
> from GOT for -fno-direct-extern-access.
> (legitimate_pic_address_disp_p): Avoid copy relocation in PIE
> for -fno-direct-extern-access.
> (ix86_print_operand): Pass true to ix86_force_load_from_GOT_p
> for call operand.
> (asm_preferred_eh_data_format): Use PC-relative format for
> -fno-direct-extern-access to avoid copy relocation.  Check
> ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4.
> (ix86_binds_local_p): Don't treat protected data as extern and
> avoid copy relocation on common symbol with
> -fno-direct-extern-access.
> (ix86_reloc_rw_mask): New to avoid copy relocation for
> -fno-direct-extern-access.
> (TARGET_ASM_RELOC_RW_MASK): New.
> * doc/invoke.texi: Document -f[no-]direct-extern-access.
>
> gcc/testsuite/
>
> PR target/35513
> PR target/100593
> * g++.dg/pr35513-1.C: New file.
> * g++.dg/pr35513-2.C: Likewise.
> * gcc.target/i386/pr35513-1.c: Likewise.
> * gcc.target/i386/pr35513-2.c: Likewise.
> * gcc.target/i386/pr35513-3.c: Likewise.
> * gcc.target/i386/pr35513-4.c: Likewise.
> * gcc.target/i386/pr35513-5.c: Likewise.
> * gcc.target/i386/pr35513-6.c: Likewise.
> * gcc.target/i386/pr35513-7.c: Likewise.
> * gcc.target/i386/pr35513-8.c: Likewise.
> ---
>  gcc/common.opt|  4 ++
>  gcc/config/i386/i386-protos.h |  2 +-
>  gcc/config/i386/i386.c| 50 +++--
>  gcc/doc/invoke.texi   | 13 ++
>  gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
>  gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
>  gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++
>  gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++
>  gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++
>  14 files changed, 278 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C
>  create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-8.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d9da1131eda..67ad811d54d 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1432,6 +1432,10 @@ fdiagnostics-minimum-margin-width=
>  Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6)
>  Set minimum width of left margin of source code when showing source.
>
> +fdirect-extern-access
> +Common Var(flag_direc

Re: [PATCH] Change the type of memory classification functions to bool

2021-07-11 Thread Richard Biener via Gcc-patches
On Fri, Jul 9, 2021 at 5:07 PM Uros Bizjak via Gcc-patches
 wrote:
>
> 2021-07-09  Uroš Bizjak  
>
> gcc/
> * recog.c (memory_address_addr_space_p): Change the type to bool.
> Return true/false instead of 1/0.
> (offsettable_memref_p): Ditto.
> (offsettable_nonstrict_memref_p): Ditto.
> (offsettable_address_addr_space_p): Ditto.
> Change the type of addressp indirect function to bool.
> * recog.h (memory_address_addr_space_p): Change the type to bool.
> (strict_memory_address_addr_space_p): Ditto.
> (offsettable_memref_p): Ditto.
> (offsettable_nonstrict_memref_p): Ditto.
> (offsettable_address_addr_space_p): Ditto.
> * reload.c (maybe_memory_address_addr_space_p): Ditto.
> (strict_memory_address_addr_space_p): Change the type to bool.
> Return true/false instead of 1/0.
> (maybe_memory_address_addr_space_p): Change the type to bool.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for master?

OK.

> Uros.


Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-11 Thread Richard Biener
On Fri, 9 Jul 2021, Segher Boessenkool wrote:

> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> > I wonder if there's a way to query the target what modes the doloop
> > pattern can handle (not being too familiar with the doloop code).
> 
> You can look what modes are allowed for operand 0 of doloop_end,
> perhaps?  Although that is a define_expand, not a define_insn, so it is
> hard to introspect.
> 
> > Why do you need to do any checks besides the new type being able to
> > represent all IV values?  The original doloop IV will never wrap
> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
> > zero ... I suppose the doloop might still do the correct thing here
> > but it also still will with a IV with larger type).
> 
> doloop_valid_p guarantees it is simple and doesn't wrap.
> 
> > I'd have expected sth like
> > 
> >ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED 
> > (ntype));
> > 
> > thus the decision made using a mode - which is also why I wonder
> > if there's a way to query the target for this.  As you say,
> > it _may_ be fast, so better check (somehow).
> 
> Almost all targets just use Pmode, but there is no such guarantee I
> think, and esp. some targets that do not have machine insns for this
> (but want to generate different code for this anyway) can do pretty much
> anything.
> 
> Maybe using just Pmode here is good enough though?

I think Pmode is a particularly bad choice and I'd prefer word_mode
if we go for any hardcoded mode.  s390x for example seems to handle
both SImode and DImode (but names the helper gen_doloop_si64
for SImode?!).  But indeed it looks like somehow querying doloop_end
is going to be difficult since the expander doesn't have any mode,
so we'd have to actually try emit RTL here.

Richard.


Re: [PATCH v3 1/2] Add -f[no-]direct-extern-access

2021-07-11 Thread Fāng-ruì Sòng via Gcc-patches
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index cff26909292..7dee311051d 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10312,13 +10312,17 @@ darwin_local_data_pic (rtx disp)
>  }
>
>  /* True if the function symbol operand X should be loaded from GOT.
> +   If CALL_P is true, X is a call operand.
> +
> +   NB: -fno-direct-extern-access doesn't force load from GOT for
> +   call.
>
> NB: In 32-bit mode, only non-PIC is allowed in inline assembly
> statements, since a PIC register could not be available at the
> call site.  */
>
>  bool
> -ix86_force_load_from_GOT_p (rtx x)
> +ix86_force_load_from_GOT_p (rtx x, bool call_p)
>  {
>return ((TARGET_64BIT || (!flag_pic && HAVE_AS_IX86_GOT32X))
>   && !TARGET_PECOFF && !TARGET_MACHO
> @@ -10326,11 +10330,12 @@ ix86_force_load_from_GOT_p (rtx x)
>   && ix86_cmodel != CM_LARGE
>   && ix86_cmodel != CM_LARGE_PIC
>   && GET_CODE (x) == SYMBOL_REF
> - && SYMBOL_REF_FUNCTION_P (x)
> - && (!flag_plt
> - || (SYMBOL_REF_DECL (x)
> - && lookup_attribute ("noplt",
> -  DECL_ATTRIBUTES (SYMBOL_REF_DECL 
> (x)
> + && ((!call_p && !flag_direct_extern_access)
> + || (SYMBOL_REF_FUNCTION_P (x)
> + && (!flag_plt
> + || (SYMBOL_REF_DECL (x)
> + && lookup_attribute ("noplt",
> +  DECL_ATTRIBUTES 
> (SYMBOL_REF_DECL (x)))
>   && !SYMBOL_REF_LOCAL_P (x));
>  }
>
> @@ -10596,7 +10601,8 @@ legitimate_pic_address_disp_p (rtx disp)
> }
>   else if (!SYMBOL_REF_FAR_ADDR_P (op0)
>&& (SYMBOL_REF_LOCAL_P (op0)
> -  || (HAVE_LD_PIE_COPYRELOC
> +  || (flag_direct_extern_access
> +  && HAVE_LD_PIE_COPYRELOC
>&& flag_pie
>&& !SYMBOL_REF_WEAK (op0)
>&& !SYMBOL_REF_FUNCTION_P (op0)))
> @@ -13498,7 +13504,7 @@ ix86_print_operand (FILE *file, rtx x, int code)
>
>if (code == 'P')
> {
> - if (ix86_force_load_from_GOT_p (x))
> + if (ix86_force_load_from_GOT_p (x, true))
> {
>   /* For inline assembly statement, load function address
>  from GOT with 'P' operand modifier to avoid PLT.  */
> @@ -21935,10 +21941,10 @@ int
>  asm_preferred_eh_data_format (int code, int global)
>  {
>/* PE-COFF is effectively always -fPIC because of the .reloc section.  */
> -  if (flag_pic || TARGET_PECOFF)
> +  if (flag_pic || TARGET_PECOFF || !flag_direct_extern_access)
>  {
>int type = DW_EH_PE_sdata8;
> -  if (!TARGET_64BIT
> +  if (ptr_mode == SImode
>   || ix86_cmodel == CM_SMALL_PIC
>   || (ix86_cmodel == CM_MEDIUM_PIC && (global || code)))
> type = DW_EH_PE_sdata4;
> @@ -23028,10 +23034,21 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree 
> *clear, tree *update)
>  static bool
>  ix86_binds_local_p (const_tree exp)
>  {
> -  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true,
> - (!flag_pic
> -  || (TARGET_64BIT
> -  && HAVE_LD_PIE_COPYRELOC != 0)));
> +  return default_binds_local_p_3 (exp, flag_shlib != 0, true,
> + flag_direct_extern_access,
> + (flag_direct_extern_access
> +  && (!flag_pic
> +  || (TARGET_64BIT
> +  && HAVE_LD_PIE_COPYRELOC != 0;
> +}

Hope we can get rid of HAVE_LD_PIE_COPYRELOC unconditionally, probably
separately.

Patch is here: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html


Re: [PATCH 10/10] vect: Reuse reduction accumulators between loops

2021-07-11 Thread Richard Biener via Gcc-patches
On Fri, Jul 9, 2021 at 3:12 PM Richard Sandiford
 wrote:
>
> Thanks for the review.
>
> Richard Biener  writes:
> >> @@ -588,6 +600,23 @@ public:
> >>/* Unrolling factor  */
> >>poly_uint64 vectorization_factor;
> >>
> >> +  /* If this loop is an epilogue loop whose main loop can be skipped,
> >> + MAIN_LOOP_EDGE is the edge from the main loop to this loop's
> >> + preheader.  SKIP_MAIN_LOOP_EDGE is then the edge that skips the
> >> + main loop and goes straight to this loop's preheader.
> >> +
> >> + Both fields are null otherwise.  */
> >> +  edge main_loop_edge;
> >> +  edge skip_main_loop_edge;
> >> +
> >> +  /* If this loop is an epilogue loop that might be skipped after 
> >> executing
> >> + the main loop, this edge is the one that skips the epilogue.  */
> >> +  edge skip_this_loop_edge;
> >> +
> >> +  /* After vectorization, maps live-out SSA names to information about
> >> + the reductions that generated them.  */
> >> +  hash_map reusable_accumulators;
> >
> > Is that the LC PHI node defs or the definition inside of the loop?
> > If the latter we could attach the info directly to its stmt-info?
>
> Ah, yeah, I should improve the comment there.  It's the vectoriser's
> replacement for the original LC PHI node, i.e. the final scalar result
> after the reduction has taken place.

OK

> >> @@ -1186,6 +1215,21 @@ public:
> >>/* The vector type for performing the actual reduction.  */
> >>tree reduc_vectype;
> >>
> >> +  /* If IS_REDUC_INFO is true and if the reduction is operating on N
> >> + elements in parallel, this vector gives the initial values of these
> >> + N elements.  */
> >
> > That's N scalar elements or N vector elements?  I suppose it's for
> > SLP reductions (rather than SLP reduction chains) and never non-SLP
> > reductions?
>
> Yeah, poor wording again, sorry.  I meant something closer to:
>
>   /* If IS_REDUC_INFO is true and if the vector code is performing
>  N scalar reductions in parallel, this vector gives the initial
>  scalar values of those N reductions.  */
>
> >> +  vec reduc_initial_values;
> >> +
> >> +  /* If IS_REDUC_INFO is true and if the reduction is operating on N
> >> + elements in parallel, this vector gives the scalar result of each
> >> + reduction.  */
> >> +  vec reduc_scalar_results;
>
> Same change here.
>
> >> […]
> >> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> >> index 2909e8a0fc3..b7b0523e3c8 100644
> >> --- a/gcc/tree-vect-loop-manip.c
> >> +++ b/gcc/tree-vect-loop-manip.c
> >> @@ -2457,6 +2457,31 @@ vect_update_epilogue_niters (loop_vec_info 
> >> epilogue_vinfo,
> >>return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, 
> >> true);
> >>  }
> >>
> >> +/* LOOP_VINFO is an epilogue loop and MAIN_LOOP_VALUE is available on exit
> >> +   from the corresponding main loop.  Return a value that is available in
> >> +   LOOP_VINFO's preheader, using SKIP_VALUE if the main loop is skipped.
> >> +   Passing a null SKIP_VALUE is equivalent to passing zero.  */
> >> +
> >> +tree
> >> +vect_get_main_loop_result (loop_vec_info loop_vinfo, tree main_loop_value,
> >> +  tree skip_value)
> >> +{
> >> +  if (!loop_vinfo->main_loop_edge)
> >> +return main_loop_value;
> >> +
> >> +  if (!skip_value)
> >> +skip_value = build_zero_cst (TREE_TYPE (main_loop_value));
> >
> > shouldn't that be the initial value?
>
> For the current use case, the above two conditions are never true.
> I wrote it like this because I had a follow-on patch (which might
> not go anywhere) that needed this function for 0-based IVs.
>
> Maybe that's a bad risk/reward trade-off though.  Not having to pass
> zero makes things only slightly simpler for the follow-on patch,
> and I guess could be dangerous in other cases.
>
> Perhaps in that case though I should change loop_vinfo->main_loop_edge
> into a gcc_assert as well.

Yeah, I think asserts (and comments in case it's because we don't handle
some specific cases yet) are better than possibly wrong behavior.

> >> +  tree phi_result = make_ssa_name (TREE_TYPE (main_loop_value));
> >> +  basic_block bb = loop_vinfo->main_loop_edge->dest;
> >> +  gphi *new_phi = create_phi_node (phi_result, bb);
> >> +  add_phi_arg (new_phi, main_loop_value, loop_vinfo->main_loop_edge,
> >> +  UNKNOWN_LOCATION);
> >> +  add_phi_arg (new_phi, skip_value,
> >> +  loop_vinfo->skip_main_loop_edge, UNKNOWN_LOCATION);
> >> +  return phi_result;
> >> +}
> >> +
> >>  /* Function vect_do_peeling.
> >>
> >> Input:
> >> […]
> >> @@ -4823,6 +4842,100 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info 
> >> stmt_info)
> >>return stmt_info;
> >>  }
> >>
> >> +/* PHI is a reduction in LOOP_VINFO that we are going to vectorize using 
> >> vector
> >> +   type VECTYPE.  See if LOOP_VINFO is an epilogue loop whose main loop 
> >> had a
> >> +   matching reduction that we can build on.  Adjust REDUC_INFO and return

Re: [PATCH] PR tree-optimization/101403: Incorrect folding of ((T)bswap(x))>>C

2021-07-11 Thread Richard Biener via Gcc-patches
On Sun, Jul 11, 2021 at 11:48 AM Roger Sayle  wrote:
>
>
> My sincere apologies for the breakage.  My recent patch to fold
> bswapN(x)>>C where the constant C was large enough that the result
> only contains bits from the low byte, and can therefore avoid
> the byte swap contains a minor logic error.  The pattern contains
> a convert? allowing an extension to occur between the bswap and
> the shift.  The logic is correct if there's no extension, or the
> extension has the same sign as the shift, but I'd mistakenly
> convinced myself that these couldn't have different signedness.
>
> (T)bswap16(x)>>12 is (T)((unsigned char)x>>4) or (T)((signed char)x>>4).
> The bug is that for zero-extensions to signed type T, we need to use
> the unsigned char variant [the signedness of the byte shift is not
> (always) the same as the signedness of T and the original shift].
>
> Then because I'm now paranoid, I've also added a clause to handle
> the hypothetical (but in practice impossible) sign-extension to an
> unsigned type T, which can implemented as (T)(x<<8)>>12.
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures, and a new
> testcase to confirm it fixes the regression.
>
> Ok for mainline?

OK.

Thanks,
Richard.

> 2021-07-11  Roger Sayle  
>
> gcc/ChangeLog
> PR tree-optimization/101403
> * gcc/match.pd ((T)bswap(X)>>C): Correctly handle cases where
> signedness of the shift is not the same as the signedness of
> the type extension.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/101403
> * gcc.dg/pr101403.c: New test case.
>
>
> Sorry again,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>