Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-29 Thread HAO CHEN GUI
Richard,

在 2023/9/28 21:39, Richard Sandiford 写道:
> That looks easily solvable though.  I've posted a potential fix as:
> 
>https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631595.html
> 
> Is that the only blocker to doing this in generic code?

Thanks so much for your patch. It works. I don't find other blocks. I
will do a regression test after I am back from Holiday.

Thanks
Gui Haochen


[PATCH-1, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-08 Thread HAO CHEN GUI
Hi,
  Vector mode instructions are efficient on some targets (e.g. ppc64).
This patch enables vector mode for compare_by_pieces. The non-member
function widest_fixed_size_mode_for_size takes by_pieces_operation
as the second argument and decide whether vector mode is enabled or
not by the type of operations. Currently only set and compare enabled
vector mode and do the optab checking correspondingly.

  The test case is in the second patch which is rs6000 specific.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
Expand: Enable vector mode for pieces compare

Vector mode compare instructions are efficient for equality compare on
rs6000. This patch refactors the codes of pieces operation to enable
vector mode for compare.

gcc/
PR target/111449
* expr.cc (widest_fixed_size_mode_for_size): Enable vector mode
for compare.  Replace the second argument with the type of pieces
operation.  Add optab checks for vector mode used in compare.
(by_pieces_ninsns): Pass the type of pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Add virtual function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::op_by_pieces_d): Call outer function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Call class function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::run): Likewise.
(class move_by_pieces_d): Declare function
widest_fixed_size_mode_for_size.
(move_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
(class store_by_pieces_d): Declare function
widest_fixed_size_mode_for_size.
(store_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
(can_store_by_pieces): Pass the type of pieces operation to
widest_fixed_size_mode_for_size.
(class compare_by_pieces_d): Declare function
widest_fixed_size_mode_for_size.
(compare_by_pieces_d::compare_by_pieces_d): Set m_qi_vector_mode
to true.
(compare_by_pieces_d::widest_fixed_size_mode_for_size): Implement.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d87346dc07f..9885404ee9c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -992,8 +992,9 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
unsigned int align)
that is narrower than SIZE bytes.  */

 static fixed_size_mode
-widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)
+widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op)
 {
+  bool qi_vector = ((op == COMPARE_BY_PIECES) || op == SET_BY_PIECES);
   fixed_size_mode result = NARROWEST_INT_MODE;

   gcc_checking_assert (size > 1);
@@ -1009,8 +1010,13 @@ widest_fixed_size_mode_for_size (unsigned int size, bool 
qi_vector)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (optab_handler (vec_duplicate_optab, candidate)
-   != CODE_FOR_nothing)
+   if ((op == SET_BY_PIECES
+&& optab_handler (vec_duplicate_optab, candidate)
+  != CODE_FOR_nothing)
+|| (op == COMPARE_BY_PIECES
+&& optab_handler (mov_optab, mode)
+   != CODE_FOR_nothing
+&& can_compare_p (EQ, mode, ccp_jump)))
  result = candidate;
  }

@@ -1061,8 +1067,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int 
align,
 {
   /* NB: Round up L and ALIGN to the widest integer mode for
 MAX_SIZE.  */
-  mode = widest_fixed_size_mode_for_size (max_size,
- op == SET_BY_PIECES);
+  mode = widest_fixed_size_mode_for_size (max_size, op);
   if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
{
  unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
@@ -1076,8 +1081,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int 
align,

   while (max_size > 1 && l > 0)
 {
-  mode = widest_fixed_size_mode_for_size (max_size,
- op == SET_BY_PIECES);
+  mode = widest_fixed_size_mode_for_size (max_size, op);
   enum insn_code icode;

   unsigned int modesize = GET_MODE_SIZE (mode);
@@ -1327,6 +1331,8 @@ class op_by_pieces_d
   virtual void finish_mode (machine_mode)
   {
   }
+  virtual fixed_size_mode widest_fixed_size_mode_for_size (unsigned int size)
+= 0;

  public:
   op_by_pieces_d (unsigned int, rtx, bool, rtx, bool, by_pieces_constfn,
@@ -1375,8 +1381,7 @@ op_by_pieces_d::op_by_pieces_d (unsigned int max_pieces, 
rtx to,
 {
   /* Find the mode of the largest comparison.  */
   fixed_size_mode mode
-   = widest_fixed_size_mode_for_size (m_max_size,
-  m_qi_vector_mode);
+   = ::widest_fixed_size_mode_for_size (m_max_size, COM

[PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-08 Thread HAO CHEN GUI
Hi,
  This patch enables vector mode for memory equality compare by adding
a new expand cbranchv16qi4 and implementing it. Also the corresponding
CC reg and compare code is set in rs6000_generate_compare. With the
patch, 16-byte equality compare can be implemented by one vector compare
instructions other than 2 8-byte compares with branches.

  The test case is in the second patch which is rs6000 specific.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: Enable vector compare for memory equality compare

gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
sequence for V16QImode equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(COMPARE_MAX_PIECES): Define.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..c69bf266402 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,39 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+   [(match_operand:V16QI 1 "gpc_reg_operand")
+(match_operand:V16QI 2 "gpc_reg_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+{
+  if (!TARGET_P9_VECTOR
+  && MEM_P (operands[1])
+  && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+  && MEM_P (operands[2])
+  && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+{
+  /* Use direct move as the byte order doesn't matter for equality
+compare.  */
+  rtx reg_op1 = gen_reg_rtx (V16QImode);
+  rtx reg_op2 = gen_reg_rtx (V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+  operands[1] = reg_op1;
+  operands[2] = reg_op2;
+}
+  else
+{
+  operands[1] = force_reg (V16QImode, operands[1]);
+  operands[2] = force_reg (V16QImode, operands[2]);
+}
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efe9adce1f8..0087d786840 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
  else
emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
}
+  else if (mode == V16QImode)
+   {
+ gcc_assert (code == EQ || code == NE);
+
+ rtx result_vector = gen_reg_rtx (V16QImode);
+ compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
+ emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+ code = (code == NE) ? GE : LT;
+   }
   else
emit_insn (gen_rtx_SET (compare_result,
gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..dc33bca0802 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,8 @@ typedef struct rs6000_args
in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
+#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)

 /* Nonzero if access to memory by bytes is no faster than for words.
Also nonzero if doing byte operations (specifically shifts) in registers
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449.c
new file mode 100644
index 000..a8c30b92a41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Ensure vector comparison is used for 16-byte memory equality compare.  */
+
+int compare1 (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 16) == 0;
+}
+
+int compare2 (const char* s1)
+{
+  return __builtin_memcmp (s1, "0123456789012345", 16) == 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */
+/* { dg-final { scan-assembler-not {\mcmpd\M} } } */


Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-10 Thread HAO CHEN GUI
Hi David,

  Thanks for your review comments.

在 2023/10/9 23:42, David Edelsohn 写道:
>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>  #define MAX_MOVE_MAX 8
> +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> 
> 
> How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES determined? 
>  The email does not provide any explanation for the implementation.  The rest 
> of the patch is related to vector support, but vector support is not 
> dependent on TARGET_POWERPC64.

By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
as MOVE_MAX. The move and compare instructions are required in
compare_by_pieces, those macros are set to 16 byte when supporting
vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
for "-m32". We discussed it in issue 1307. TImode will be used for
move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
with "-m32" which might cause ICE.

So MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set to 4 for 32 bit
target in this patch. They could be changed to 16 after rs6000
supports TImode with "-m32".

Thanks
Gui Haochen


[PATCH-1v2, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi,
  Vector mode instructions are efficient on some targets (e.g. ppc64).
This patch enables vector mode for compare_by_pieces. The non-member
function widest_fixed_size_mode_for_size takes by_pieces_operation
as the second argument and decide whether vector mode is enabled or
not by the type of operations. Currently only set and compare enabled
vector mode and do the optab checking correspondingly.

  The test case is in the second patch which is rs6000 specific.

  Compared to last version, the main change is to enable vector mode
for compare_by_pieces in smallest_fixed_size_mode_for_size which
is used for overlapping compare.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
Expand: Enable vector mode for pieces compares

Vector mode compare instructions are efficient for equality compare on
rs6000. This patch refactors the codes of pieces operation to enable
vector mode for compare.

gcc/
PR target/111449
* expr.cc (widest_fixed_size_mode_for_size): Enable vector mode
for compare.  Replace the second argument with the type of pieces
operation.  Add optab checks for vector mode used in compare.
(by_pieces_ninsns): Pass the type of pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Define virtual function
widest_fixed_size_mode_for_size and optab_checking.
(op_by_pieces_d::op_by_pieces_d): Call outer function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Call class function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
optab_checking for different types of operations.
(op_by_pieces_d::run): Call class function
widest_fixed_size_mode_for_size.
(class move_by_pieces_d): Declare function
widest_fixed_size_mode_for_size.
(move_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
(class store_by_pieces_d): Declare function
widest_fixed_size_mode_for_size and optab_checking.
(store_by_pieces_d::optab_checking): Implement.
(store_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
(can_store_by_pieces): Pass the type of pieces operation to
widest_fixed_size_mode_for_size.
(class compare_by_pieces_d): Declare function
widest_fixed_size_mode_for_size and optab_checking.
(compare_by_pieces_d::compare_by_pieces_d): Set m_qi_vector_mode
to true to enable vector mode.
(compare_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
(compare_by_pieces_d::optab_checking): Implement.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9a37bff1fdd..e83c0a378ed 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -992,8 +992,9 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
unsigned int align)
that is narrower than SIZE bytes.  */

 static fixed_size_mode
-widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)
+widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op)
 {
+  bool qi_vector = ((op == COMPARE_BY_PIECES) || op == SET_BY_PIECES);
   fixed_size_mode result = NARROWEST_INT_MODE;

   gcc_checking_assert (size > 1);
@@ -1009,8 +1010,13 @@ widest_fixed_size_mode_for_size (unsigned int size, bool 
qi_vector)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (optab_handler (vec_duplicate_optab, candidate)
-   != CODE_FOR_nothing)
+   if ((op == SET_BY_PIECES
+&& optab_handler (vec_duplicate_optab, candidate)
+  != CODE_FOR_nothing)
+|| (op == COMPARE_BY_PIECES
+&& optab_handler (mov_optab, mode)
+   != CODE_FOR_nothing
+&& can_compare_p (EQ, mode, ccp_jump)))
  result = candidate;
  }

@@ -1061,8 +1067,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int 
align,
 {
   /* NB: Round up L and ALIGN to the widest integer mode for
 MAX_SIZE.  */
-  mode = widest_fixed_size_mode_for_size (max_size,
- op == SET_BY_PIECES);
+  mode = widest_fixed_size_mode_for_size (max_size, op);
   if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
{
  unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
@@ -1076,8 +1081,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int 
align,

   while (max_size > 1 && l > 0)
 {
-  mode = widest_fixed_size_mode_for_size (max_size,
- op == SET_BY_PIECES);
+  mode = widest_fixed_size_mode_for_size (max_size, op);
   enum insn_code icode;

   unsigned int modesize = GET_MODE_SIZE (mode);
@@ -1327,6 +1331,12 @@ class op_by_pieces_d
   virtual void finish_mode (machine_mode

[PATCH-2v2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi,
  This patch enables vector mode for memory equality compare by adding
a new expand cbranchv16qi4 and implementing it. Also the corresponding
CC reg and compare code is set in rs6000_generate_compare. With the
patch, 16-byte equality compare can be implemented by one vector compare
instructions other than two 8-byte compares with branches.

  The vector mode compare is only enabled on powerpc64 as TImode hasn't
be supported on 32 bit platform. By setting MOVE_MAX_PIECES to 16, TImode
compare might be generated.

  Compared to last version, the main change is to add guard "TARGET_VSX"
to the expand as it's required by unaligned vector load.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: Enable vector compare for memory equality compare

gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
sequence for V16QImode equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(COMPARE_MAX_PIECES): Define.
(STORE_MAX_PIECES): Define.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..e4492ff9569 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,42 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+   [(match_operand:V16QI 1 "reg_or_mem_operand")
+(match_operand:V16QI 2 "reg_or_mem_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)
+   && TARGET_VSX"
+{
+  if (!TARGET_P9_VECTOR
+  && !BYTES_BIG_ENDIAN
+  && MEM_P (operands[1])
+  && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+  && MEM_P (operands[2])
+  && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+{
+  /* Use direct move for P8 little endian to skip bswap, as the byte
+order doesn't matter for equality compare.  */
+  rtx reg_op1 = gen_reg_rtx (V16QImode);
+  rtx reg_op2 = gen_reg_rtx (V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+  operands[1] = reg_op1;
+  operands[2] = reg_op2;
+}
+  else
+{
+  operands[1] = force_reg (V16QImode, operands[1]);
+  operands[2] = force_reg (V16QImode, operands[2]);
+}
+
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efe9adce1f8..0087d786840 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
  else
emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
}
+  else if (mode == V16QImode)
+   {
+ gcc_assert (code == EQ || code == NE);
+
+ rtx result_vector = gen_reg_rtx (V16QImode);
+ compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
+ emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+ code = (code == NE) ? GE : LT;
+   }
   else
emit_insn (gen_rtx_SET (compare_result,
gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..dd8565e3971 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,9 @@ typedef struct rs6000_args
in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
+#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
+#define STORE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 8)

 /* Nonzero if access to memory by bytes is no faster than for words.
Also nonzero if doing byte operations (specifically shifts) in registers
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449.c
new file mode 100644
index 000..a8c30b92a41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Ensure vector comparison is used for 16-byte memory equality compare.  */
+
+int compare1 (const char* s1, const char* s2

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi David,

在 2023/10/10 20:44, David Edelsohn 写道:
> Are you stating that although PPC32 supports V16QImode in VSX, the 
> move_by_pieces support also requires TImode, which is not available on PPC32?
> 

Yes. By setting MOVE_MAX_PIECES to 16, TImode compare
might be generated as it checks vector mode first then
uses scalar mode by default.

Thanks
Gui Haochen


PATCH-1v3, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-13 Thread HAO CHEN GUI
Hi,
  Vector mode instructions are efficient for compare on some targets.
This patch enables vector mode for compare_by_pieces. Currently,
vector mode is enabled for compare, set and clear. Helper function
"qi_vector_p" decides if vector mode is enabled for certain by pieces
operation. optabs_checking checks if optabs are available for the
mode and certain by pieces operations. Both of them are called in
fixed_size_mode finding functions. A member is added to class
op_by_pieces_d in order to record the type of by pieces operations.

  The test case is in the second patch which is rs6000 specific.

  Compared to last version, the main change is to create two helper
functions and call them in mode finding function.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
Expand: Enable vector mode for pieces compares

Vector mode compare instructions are efficient for equality compare on
rs6000. This patch refactors the codes of pieces operation to enable
vector mode for compare.

gcc/
PR target/111449
* expr.cc (qi_vector_p): New function to indicate if vector mode
is enabled for certain by pieces operations.
(optabs_checking): New function to check if optabs are available
for certain by pieces operations.
(widest_fixed_size_mode_for_size): Replace the second argument
with the type of by pieces operations.  Call qi_vector_p to check
if vector mode is enable.  Call optabs_checking to check if optabs
are available for the candidate vector mode.
(by_pieces_ninsns): Pass the type of by pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Add a protected member m_op to record the
type of by pieces operations.  Declare member function
fixed_size_mode widest_fixed_size_mode_for_size.
(op_by_pieces_d::op_by_pieces_d): Change last argument to the type
of by pieces operations, initialize m_op with it.  Call non-member
function widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Call member function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
qi_vector_p to check if vector mode is enable.  Call
optabs_checking to check if optabs are available for the candidate
vector mode.
(op_by_pieces_d::run): Call member function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
(move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES.
(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
(can_store_by_pieces): Pass the type of by pieces operations to
widest_fixed_size_mode_for_size.
(clear_by_pieces): Initialize class store_by_pieces_d with
CLEAR_BY_PIECES.
(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
COMPARE_BY_PIECES.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d87346dc07f..8ec3f5465a9 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -988,18 +988,43 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
unsigned int align)
   return align;
 }

-/* Return the widest QI vector, if QI_MODE is true, or integer mode
-   that is narrower than SIZE bytes.  */
+/* Return true if vector mode is enabled for the op.  */
+static bool
+qi_vector_p (by_pieces_operation op)
+{
+  return (op == COMPARE_BY_PIECES
+ || op == SET_BY_PIECES
+ || op == CLEAR_BY_PIECES);
+}
+
+/* Return true if optabs are available for the mode and by pieces
+   operations.  */
+static bool
+optabs_checking (fixed_size_mode mode, by_pieces_operation op)
+{
+  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
+  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
+return true;
+  else if (op == COMPARE_BY_PIECES
+  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
+  && can_compare_p (EQ, mode, ccp_jump))
+return true;
+
+  return false;
+}
+
+/* Return the widest QI vector, if vector mode is enabled for the op,
+   or integer mode that is narrower than SIZE bytes.  */

 static fixed_size_mode
-widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)
+widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op)
 {
   fixed_size_mode result = NARROWEST_INT_MODE;

   gcc_checking_assert (size > 1);

   /* Use QI vector only if size is wider than a WORD.  */
-  if (qi_vector && size > UNITS_PER_WORD)
+  if (qi_vector_p (op) && size > UNITS_PER_WORD)
 {
   machine_mode mode;
   fixed_size_mode candidate;
@@ -1009,8 +1034,7 @@ widest_fixed_size_mode_for_size (unsigned int size, bool 
qi_vector)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (optab_handler (vec_duplicate_optab, candidate)
-   != CODE_FOR_nothing)
+  

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-19 Thread HAO CHEN GUI
Kewen & David,
  Thanks for your comments.

在 2023/10/17 10:19, Kewen.Lin 写道:
> I think David raised a good question, it sounds to me that the current
> handling simply consider that if MOVE_MAX_PIECES is set to 16, the
> required operations for this optimization on TImode are always available,
> but unfortunately on rs6000 the assumption doesn't hold, so could we
> teach generic code instead?

Finally I found that it doesn't check if the scalar mode used in by pieces
operations is enabled by the target. The TImode is not enabled on ppc. It
should be checked before taking TImode to do by pieces operations. I made
a patch for the generic code and testing it. With the patch, 16-byte
comparison could be enabled on both ppc64 and ppc.

Thanks
Gui Haochen


[PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-20 Thread HAO CHEN GUI
Hi,
  Vector mode instructions are efficient for compare on some targets.
This patch enables vector mode for compare_by_pieces. Two help
functions are added to check if vector mode is available for certain
by pieces operations and if if optabs exists for the mode and certain
by pieces operations. One member is added in class op_by_pieces_d to
record the type of operations.

  The test case is in the second patch which is rs6000 specific.

  Compared to last version, the main change is to add a target hook
check - scalar_mode_supported_p when retrieving the available scalar
modes. The mode which is not supported for a target should be skipped.
(e.g. TImode on ppc). Also some function names and comments are refined
according to reviewer's advice.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
Expand: Enable vector mode for by pieces compares

Vector mode compare instructions are efficient for equality compare on
rs6000. This patch refactors the codes of by pieces operation to enable
vector mode for compare.

gcc/
PR target/111449
* expr.cc (can_use_qi_vectors): New function to return true if
we know how to implement OP using vectors of bytes.
(qi_vector_mode_supported_p): New function to check if optabs
exists for the mode and certain by pieces operations.
(widest_fixed_size_mode_for_size): Replace the second argument
with the type of by pieces operations.  Call can_use_qi_vectors
and qi_vector_mode_supported_p to do the check.  Call
scalar_mode_supported_p to check if the scalar mode is supported.
(by_pieces_ninsns): Pass the type of by pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
record the type of by pieces operations.
(op_by_pieces_d::op_by_pieces_d): Change last argument to the
type of by pieces operations, initialize m_op with it.  Pass
m_op to function widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Pass m_op to function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
can_use_qi_vectors and qi_vector_mode_supported_p to do the
check.
(op_by_pieces_d::run): Pass m_op to function
widest_fixed_size_mode_for_size.
(move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES.
(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
(can_store_by_pieces): Pass the type of by pieces operations to
widest_fixed_size_mode_for_size.
(clear_by_pieces): Initialize class store_by_pieces_d with
CLEAR_BY_PIECES.
(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
COMPARE_BY_PIECES.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 2c9930ec674..ad5f9dd8ec2 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
unsigned int align)
   return align;
 }

-/* Return the widest QI vector, if QI_MODE is true, or integer mode
-   that is narrower than SIZE bytes.  */
+/* Return true if we know how to implement OP using vectors of bytes.  */
+static bool
+can_use_qi_vectors (by_pieces_operation op)
+{
+  return (op == COMPARE_BY_PIECES
+ || op == SET_BY_PIECES
+ || op == CLEAR_BY_PIECES);
+}
+
+/* Return true if optabs exists for the mode and certain by pieces
+   operations.  */
+static bool
+qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
+{
+  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
+  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
+return true;
+
+  if (op == COMPARE_BY_PIECES
+  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
+  && can_compare_p (EQ, mode, ccp_jump))
+return true;

+  return false;
+}
+
+/* Return the widest mode that can be used to perform part of an
+   operation OP on SIZE bytes.  Try to use QI vector modes where
+   possible.  */
 static fixed_size_mode
-widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)
+widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op)
 {
   fixed_size_mode result = NARROWEST_INT_MODE;

   gcc_checking_assert (size > 1);

   /* Use QI vector only if size is wider than a WORD.  */
-  if (qi_vector && size > UNITS_PER_WORD)
+  if (can_use_qi_vectors (op) && size > UNITS_PER_WORD)
 {
   machine_mode mode;
   fixed_size_mode candidate;
@@ -1009,8 +1035,7 @@ widest_fixed_size_mode_for_size (unsigned int size, bool 
qi_vector)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (optab_handler (vec_duplicate_optab, candidate)
-   != CODE_FOR_nothing)
+   if (qi_vector_mode_supported_p (candidate, op))
  result = candidate;
  }

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-22 Thread HAO CHEN GUI
Committed as r14-4835.

https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085

Thanks
Gui Haochen

在 2023/10/20 16:49, Richard Sandiford 写道:
> HAO CHEN GUI  writes:
>> Hi,
>>   Vector mode instructions are efficient for compare on some targets.
>> This patch enables vector mode for compare_by_pieces. Two help
>> functions are added to check if vector mode is available for certain
>> by pieces operations and if if optabs exists for the mode and certain
>> by pieces operations. One member is added in class op_by_pieces_d to
>> record the type of operations.
>>
>>   The test case is in the second patch which is rs6000 specific.
>>
>>   Compared to last version, the main change is to add a target hook
>> check - scalar_mode_supported_p when retrieving the available scalar
>> modes. The mode which is not supported for a target should be skipped.
>> (e.g. TImode on ppc). Also some function names and comments are refined
>> according to reviewer's advice.
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions.
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> Expand: Enable vector mode for by pieces compares
>>
>> Vector mode compare instructions are efficient for equality compare on
>> rs6000. This patch refactors the codes of by pieces operation to enable
>> vector mode for compare.
>>
>> gcc/
>>  PR target/111449
>>  * expr.cc (can_use_qi_vectors): New function to return true if
>>  we know how to implement OP using vectors of bytes.
>>  (qi_vector_mode_supported_p): New function to check if optabs
>>  exists for the mode and certain by pieces operations.
>>  (widest_fixed_size_mode_for_size): Replace the second argument
>>  with the type of by pieces operations.  Call can_use_qi_vectors
>>  and qi_vector_mode_supported_p to do the check.  Call
>>  scalar_mode_supported_p to check if the scalar mode is supported.
>>  (by_pieces_ninsns): Pass the type of by pieces operation to
>>  widest_fixed_size_mode_for_size.
>>  (class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
>>  record the type of by pieces operations.
>>  (op_by_pieces_d::op_by_pieces_d): Change last argument to the
>>  type of by pieces operations, initialize m_op with it.  Pass
>>  m_op to function widest_fixed_size_mode_for_size.
>>  (op_by_pieces_d::get_usable_mode): Pass m_op to function
>>  widest_fixed_size_mode_for_size.
>>  (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
>>  can_use_qi_vectors and qi_vector_mode_supported_p to do the
>>  check.
>>  (op_by_pieces_d::run): Pass m_op to function
>>  widest_fixed_size_mode_for_size.
>>  (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES.
>>  (store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
>>  (can_store_by_pieces): Pass the type of by pieces operations to
>>  widest_fixed_size_mode_for_size.
>>  (clear_by_pieces): Initialize class store_by_pieces_d with
>>  CLEAR_BY_PIECES.
>>  (compare_by_pieces_d::compare_by_pieces_d): Set m_op to
>>  COMPARE_BY_PIECES.
> 
> OK, thanks.  And thanks for your patience.
> 
> Richard
> 
>> patch.diff
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 2c9930ec674..ad5f9dd8ec2 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
>> unsigned int align)
>>return align;
>>  }
>>
>> -/* Return the widest QI vector, if QI_MODE is true, or integer mode
>> -   that is narrower than SIZE bytes.  */
>> +/* Return true if we know how to implement OP using vectors of bytes.  */
>> +static bool
>> +can_use_qi_vectors (by_pieces_operation op)
>> +{
>> +  return (op == COMPARE_BY_PIECES
>> +  || op == SET_BY_PIECES
>> +  || op == CLEAR_BY_PIECES);
>> +}
>> +
>> +/* Return true if optabs exists for the mode and certain by pieces
>> +   operations.  */
>> +static bool
>> +qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
>> +{
>> +  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
>> +  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
>> +return true;
>> +
>> +  if (op == COMPARE_BY_PIECES
>> +  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
>> +  && can_compare_p (EQ, mode, ccp_jump))
>> +return true;
>&

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread HAO CHEN GUI
OK, I will take it.

Thanks
Gui Haochen

在 2023/10/24 16:49, Jiang, Haochen 写道:
> It seems that the mail got caught elsewhere and did not send into gcc-patches
> mailing thread. Resending that.
> 
> Thx,
> Haochen
> 
> -Original Message-
> From: Jiang, Haochen 
> Sent: Tuesday, October 24, 2023 4:43 PM
> To: HAO CHEN GUI ; Richard Sandiford 
> 
> Cc: gcc-patches 
> Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces 
> [PR111449]
> 
> Hi Haochen Gui,
> 
> It seems that the commit caused lots of test case fail on x86 platforms:
> 
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html
> 
> Please help verify that if we need some testcase change or we get bug here.
> 
> A simple reproducer under build folder is:
> 
> make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
> --target_board='unix{-m64\ -march=cascadelake,-m32\ 
> -march=cascadelake,-m32,-m64}'"
> 
> Thx,
> Haochen
> 
>> -Original Message-
>> From: HAO CHEN GUI 
>> Sent: Monday, October 23, 2023 9:30 AM
>> To: Richard Sandiford 
>> Cc: gcc-patches 
>> Subject: Re: [PATCH-1v4, expand] Enable vector mode for 
>> compare_by_pieces [PR111449]
>>
>> Committed as r14-4835.
>>
>> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
>>
>> Thanks
>> Gui Haochen
>>
>> 在 2023/10/20 16:49, Richard Sandiford 写道:
>>> HAO CHEN GUI  writes:
>>>> Hi,
>>>>   Vector mode instructions are efficient for compare on some targets.
>>>> This patch enables vector mode for compare_by_pieces. Two help 
>>>> functions are added to check if vector mode is available for 
>>>> certain by pieces operations and if if optabs exists for the mode 
>>>> and certain by pieces operations. One member is added in class 
>>>> op_by_pieces_d to record the type of operations.
>>>>
>>>>   The test case is in the second patch which is rs6000 specific.
>>>>
>>>>   Compared to last version, the main change is to add a target hook 
>>>> check - scalar_mode_supported_p when retrieving the available 
>>>> scalar modes. The mode which is not supported for a target should be 
>>>> skipped.
>>>> (e.g. TImode on ppc). Also some function names and comments are 
>>>> refined according to reviewer's advice.
>>>>
>>>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with 
>>>> no regressions.
>>>>
>>>> Thanks
>>>> Gui Haochen
>>>>
>>>> ChangeLog
>>>> Expand: Enable vector mode for by pieces compares
>>>>
>>>> Vector mode compare instructions are efficient for equality compare 
>>>> on rs6000. This patch refactors the codes of by pieces operation to 
>>>> enable vector mode for compare.
>>>>
>>>> gcc/
>>>>PR target/111449
>>>>* expr.cc (can_use_qi_vectors): New function to return true if
>>>>we know how to implement OP using vectors of bytes.
>>>>(qi_vector_mode_supported_p): New function to check if optabs
>>>>exists for the mode and certain by pieces operations.
>>>>(widest_fixed_size_mode_for_size): Replace the second argument
>>>>with the type of by pieces operations.  Call can_use_qi_vectors
>>>>and qi_vector_mode_supported_p to do the check.  Call
>>>>scalar_mode_supported_p to check if the scalar mode is supported.
>>>>(by_pieces_ninsns): Pass the type of by pieces operation to
>>>>widest_fixed_size_mode_for_size.
>>>>(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
>>>>record the type of by pieces operations.
>>>>(op_by_pieces_d::op_by_pieces_d): Change last argument to the
>>>>type of by pieces operations, initialize m_op with it.  Pass
>>>>m_op to function widest_fixed_size_mode_for_size.
>>>>(op_by_pieces_d::get_usable_mode): Pass m_op to function
>>>>widest_fixed_size_mode_for_size.
>>>>(op_by_pieces_d::smallest_fixed_size_mode_for_siz

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread HAO CHEN GUI
Hi Haochen,
  The regression cases are caused by "targetm.scalar_mode_supported_p" I added
for scalar mode checking. XImode, OImode and TImode (with -m32) are not
enabled in ix86_scalar_mode_supported_p. So they're excluded from by pieces
operations on i386.

  The original code doesn't do a check for scalar modes. I think it might be
incorrect as not all scalar modes support move and compare optabs. (e.g.
TImode with -m32 on rs6000).

  I drafted a new patch to manually check optabs for scalar mode. Now both
vector and scalar modes are checked for optabs.

  I did a simple test. All former regression cases are back. Could you help do
a full regression test? I am worry about the coverage of my CI system.

Thanks
Gui Haochen

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7aac575eff8..2af9fcbed18 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
 /* Return true if optabs exists for the mode and certain by pieces
operations.  */
 static bool
-qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
+mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
+  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+return false;
+
   if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
-  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
-return true;
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
+return false;

   if (op == COMPARE_BY_PIECES
-  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
-  && can_compare_p (EQ, mode, ccp_jump))
-return true;
+  && !can_compare_p (EQ, mode, ccp_jump))
+return false;

-  return false;
+  return true;
 }

 /* Return the widest mode that can be used to perform part of an
@@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (qi_vector_mode_supported_p (candidate, op))
+   if (mode_supported_p (candidate, op))
  result = candidate;
  }

@@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
 {
   mode = tmode.require ();
   if (GET_MODE_SIZE (mode) < size
- && targetm.scalar_mode_supported_p (mode))
+ && mode_supported_p (mode, op))
   result = mode;
 }

@@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size 
(unsigned int size)
  break;

if (GET_MODE_SIZE (candidate) >= size
-   && qi_vector_mode_supported_p (candidate, m_op))
+   && mode_supported_p (candidate, m_op))
  return candidate;
  }
 }


[PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-27 Thread HAO CHEN GUI
Hi,
  This patch checks available optabs for scalar modes used in by
pieces operations. It fixes the regression cases caused by previous
patch. Now both scalar and vector modes are examined by the same
approach.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
Expand: Checking available optabs for scalar modes in by pieces operations

The former patch (f08ca5903c7) examines the scalar modes by target
hook scalar_mode_supported_p.  It causes some i386 regression cases
as XImode and OImode are not enabled in i386 target function.  This
patch examines the scalar mode by checking if the corresponding optabs
are available for the mode.

gcc/
PR target/111449
* expr.cc (qi_vector_mode_supported_p): Rename to...
(by_pieces_mode_supported_p): ...this, and extends it to do
the checking for both scalar and vector mode.
(widest_fixed_size_mode_for_size): Call
by_pieces_mode_supported_p to examine the mode.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7aac575eff8..2af9fcbed18 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
 /* Return true if optabs exists for the mode and certain by pieces
operations.  */
 static bool
-qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
+by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
+  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+return false;
+
   if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
-  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
-return true;
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
+return false;

   if (op == COMPARE_BY_PIECES
-  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
-  && can_compare_p (EQ, mode, ccp_jump))
-return true;
+  && !can_compare_p (EQ, mode, ccp_jump))
+return false;

-  return false;
+  return true;
 }

 /* Return the widest mode that can be used to perform part of an
@@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (qi_vector_mode_supported_p (candidate, op))
+   if (by_pieces_mode_supported_p (candidate, op))
  result = candidate;
  }

@@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
 {
   mode = tmode.require ();
   if (GET_MODE_SIZE (mode) < size
- && targetm.scalar_mode_supported_p (mode))
+ && by_pieces_mode_supported_p (mode, op))
   result = mode;
 }

@@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size 
(unsigned int size)
  break;

if (GET_MODE_SIZE (candidate) >= size
-   && qi_vector_mode_supported_p (candidate, m_op))
+   && by_pieces_mode_supported_p (candidate, m_op))
  return candidate;
  }
 }


Re: [PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-29 Thread HAO CHEN GUI
Committed as r14-5001.

Thanks
Gui Haochen

在 2023/10/27 17:29, Richard Sandiford 写道:
> HAO CHEN GUI  writes:
>> Hi,
>>   This patch checks available optabs for scalar modes used in by
>> pieces operations. It fixes the regression cases caused by previous
>> patch. Now both scalar and vector modes are examined by the same
>> approach.
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is this OK for trunk?
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> Expand: Checking available optabs for scalar modes in by pieces operations
>>
>> The former patch (f08ca5903c7) examines the scalar modes by target
>> hook scalar_mode_supported_p.  It causes some i386 regression cases
>> as XImode and OImode are not enabled in i386 target function.  This
>> patch examines the scalar mode by checking if the corresponding optabs
>> are available for the mode.
>>
>> gcc/
>>  PR target/111449
>>  * expr.cc (qi_vector_mode_supported_p): Rename to...
>>  (by_pieces_mode_supported_p): ...this, and extends it to do
>>  the checking for both scalar and vector mode.
>>  (widest_fixed_size_mode_for_size): Call
>>  by_pieces_mode_supported_p to examine the mode.
>>  (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.
> 
> OK, thanks.
> 
> Richard
> 
>> patch.diff
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 7aac575eff8..2af9fcbed18 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
>>  /* Return true if optabs exists for the mode and certain by pieces
>> operations.  */
>>  static bool
>> -qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
>> +by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
>>  {
>> +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
>> +return false;
>> +
>>if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
>> -  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
>> -return true;
>> +  && VECTOR_MODE_P (mode)
>> +  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
>> +return false;
>>
>>if (op == COMPARE_BY_PIECES
>> -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
>> -  && can_compare_p (EQ, mode, ccp_jump))
>> -return true;
>> +  && !can_compare_p (EQ, mode, ccp_jump))
>> +return false;
>>
>> -  return false;
>> +  return true;
>>  }
>>
>>  /* Return the widest mode that can be used to perform part of an
>> @@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
>> by_pieces_operation op)
>>{
>>  if (GET_MODE_SIZE (candidate) >= size)
>>break;
>> -if (qi_vector_mode_supported_p (candidate, op))
>> +if (by_pieces_mode_supported_p (candidate, op))
>>result = candidate;
>>}
>>
>> @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
>> by_pieces_operation op)
>>  {
>>mode = tmode.require ();
>>if (GET_MODE_SIZE (mode) < size
>> -  && targetm.scalar_mode_supported_p (mode))
>> +  && by_pieces_mode_supported_p (mode, op))
>>result = mode;
>>  }
>>
>> @@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size 
>> (unsigned int size)
>>break;
>>
>>  if (GET_MODE_SIZE (candidate) >= size
>> -&& qi_vector_mode_supported_p (candidate, m_op))
>> +&& by_pieces_mode_supported_p (candidate, m_op))
>>return candidate;
>>}
>>  }


[PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi,
  This patch enables vector mode for by pieces equality compare. It
adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES
and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare
relies both move and compare instructions, so both macro are changed.
The vector load/store might be unaligned, so the 16-byte move and
compare are only enabled when p8 vector enabled (TARGET_VSX +
TARGET_EFFICIENT_UNALIGNED_VSX).

  This patch enables 16 byte by pieces move. As the vector mode is not
enabled for by pieces move, TImode is used for the move. It caused some
regression cases. I drafted the third patch to fix them.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable vector mode for by pieces equality compare

This patch adds a new expand pattern - cbranchv16qi4 to enable vector
mode by pieces equality compare on rs6000.  The macro MOVE_MAX_PIECES
(COMPARE_MAX_PIECES) is set to 16 bytes when P8 vector is enabled,
otherwise keeps unchanged.  The macro STORE_MAX_PIECES is set to the
same value as MOVE_MAX_PIECES by default, so now it's explicitly
defined and keeps unchanged.

gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
insn sequence for V16QImode equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(STORE_MAX_PIECES): Define.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..d0937f192d6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,45 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+/* The cbranch_optabs doesn't allow FAIL, so altivec load/store
+   instructions are disabled as the cost is high for unaligned
+   load/store.  */
+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+   [(match_operand:V16QI 1 "reg_or_mem_operand")
+(match_operand:V16QI 2 "reg_or_mem_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_MEM_VSX_P (V16QImode)
+   && TARGET_EFFICIENT_UNALIGNED_VSX"
+{
+  if (!TARGET_P9_VECTOR
+  && !BYTES_BIG_ENDIAN
+  && MEM_P (operands[1])
+  && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+  && MEM_P (operands[2])
+  && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+{
+  /* Use direct move for P8 little endian to skip bswap, as the byte
+order doesn't matter for equality compare.  */
+  rtx reg_op1 = gen_reg_rtx (V16QImode);
+  rtx reg_op2 = gen_reg_rtx (V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+  operands[1] = reg_op1;
+  operands[2] = reg_op2;
+}
+  else
+{
+  operands[1] = force_reg (V16QImode, operands[1]);
+  operands[2] = force_reg (V16QImode, operands[2]);
+}
+
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cc24dd5301e..10279052636 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
  else
emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
}
+  else if (mode == V16QImode)
+   {
+ gcc_assert (code == EQ || code == NE);
+
+ rtx result_vector = gen_reg_rtx (V16QImode);
+ rtx cc_bit = gen_reg_rtx (SImode);
+ emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+ emit_insn (gen_cr6_test_for_lt (cc_bit));
+ emit_insn (gen_rtx_SET (compare_result,
+ gen_rtx_COMPARE (comp_mode, cc_bit,
+  const1_rtx)));
+   }
   else
emit_insn (gen_rtx_SET (compare_result,
gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 22595f6ebd7..51441825e20 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,8 @@ typedef struct rs6000_args
in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (TARGET_P8_VECTOR ? 16 : (TARGET_POWERPC64 ? 8 : 4))
+#define STORE_MAX_PIECES (TARGET_POWERPC64 ? 8 : 4)

 /* Nonzero if acces

[PATCH-3, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi,
  The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes
the regression cases caused by previous patch. For sra-17/18, the long
array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit
platform. So the array is not be constructed in LC0 and SRA optimization
is unable to be taken. "no-vsx" option is added for 32-bit platform, as
it sets the MOVE_MAX_PIECES to 4-byte on 32-bit platform and the array
can't be loaded by one by pieces move.

  Another regression is on P8 LE. The 16-byte memory to memory is
implemented by two TImode load/store. The TImode load/store is finally
split to two DImode load/store on P8 LE as it doesn't have unaligned
vector load/store instructions. Actually, 16-byte memory to memory move
can be implement by two V2DI reversed load/store on P8 LE. The patch
creates a insn_and_split pattern for this optimization.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Enable 16-byte by pieces move

This patch enables 16-byte by pieces move.  The 16-byte move is generated
with TImode and finally implemented by vector instructions.  There are
several regression cases after the enablement.  16-byte TImode memory to
memory move is originally implemented by two pairs of DImode load/store on
P8 LE as there is no unalignment vsx load/store on it.  The patch fixes
the problem by creating an insn_and_split pattern and converts it to one
pair of reversed load/store.  Two SRA cases lost the SRA optimization as
the array can be loaded by one 16-byte move so that not be initialized in
LC0 on 32-bit platform.  So fixes them by adding no-vsx option.

gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

gcc/testsuite/
PR target/111449
* gcc.dg/tree-ssa/sra-17.c: Add no-vsx option for powerpc ilp32.
* gcc.dg/tree-ssa/sra-18.c: Likewise.
* gcc.target/powerpc/pr111449-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f3b40229094..9f6bc49998a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d")

 ;; VSX moves

+;; TImode memory to memory move optimization on LE with p8vector
+(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti"
+  [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z")
+   (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
+   && !MEM_VOLATILE_P (operands[0])
+   && !MEM_VOLATILE_P (operands[1])
+   && !reload_completed"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx src =  adjust_address (operands[1], V2DImode, 0);
+  emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src));
+  rtx dest = adjust_address (operands[0], V2DImode, 0);
+  emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")])
+
 ;; The patterns for LE permuted loads and stores come before the general
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
index 221d96b6cd9..36d72c9256b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* 
powerpc*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fdump-tree-esra --param 
sra-max-scalarization-size-Ospeed=32" } */
 /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */
+/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */

 extern void abort (void);

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
index f5e6a21c2ae..3682a9a8c29 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* 
powerpc*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fdump-tree-esra --param 
sra-max-scalarization-size-Ospeed=32" } */
 /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */
+/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */

 extern void abort (void);
 struct foo { long x; };
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
new file mode 100644
index 000..7003bdc0208
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { has_arch_pwr8 } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+/* Ensure 16-byte by pieces move is enabled.  */
+
+void move1 (void *s1, void *s2)
+{
+  __builtin_memcpy (s1, s2, 16);
+}
+
+void move2 (void *s1)
+{
+  __builtin_memcpy (s1, "0123456789012345", 16);
+}
+
+/

[PATCH-3v2, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-06 Thread HAO CHEN GUI
Hi,
  The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes
the regression cases caused by previous patch. For sra-17/18, the long
array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit
platform. So the array is not be constructed in LC0 and SRA optimization
is unable to be taken. "no-vsx" option is added for 32-bit platform, as
it sets the MOVE_MAX_PIECES to 4-byte on 32-bit platform and the array
can't be loaded by one by pieces move.

  Another regression is on P8 LE. The 16-byte memory to memory is
implemented by two TImode load/store. The TImode load/store is finally
split to two DImode load/store on P8 LE as it doesn't have unaligned
vector load/store instructions. Actually, 16-byte memory to memory move
can be implement by two V2DI reversed load/store on P8 LE. The patch
creates a insn_and_split pattern for this optimization.

  Compared to previous version, it fixes the syntax errors in test cases.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Enable 16-byte by pieces move

This patch enables 16-byte by pieces move.  The 16-byte move is generated
with TImode and finally implemented by vector instructions.  There are
several regression cases after the enablement.  16-byte TImode memory to
memory move is originally implemented by two pairs of DImode load/store on
P8 LE as there is no unaligned vsx load/store on it.  The patch fixes
the problem by creating an insn_and_split pattern and converts it to one
pair of reversed load/store.  Two SRA cases lost the SRA optimization as
the array can be loaded by one 16-byte move so that not be initialized in
LC0 on 32-bit platform.  So fixes them by adding no-vsx option.

gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

gcc/testsuite/
PR target/111449
* gcc.dg/tree-ssa/sra-17.c: Add no-vsx option for powerpc ilp32.
* gcc.dg/tree-ssa/sra-18.c: Likewise.
* gcc.target/powerpc/pr111449-1.c: New.


patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f3b40229094..9f6bc49998a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d")

 ;; VSX moves

+;; TImode memory to memory move optimization on LE with p8vector
+(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti"
+  [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z")
+   (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
+   && !MEM_VOLATILE_P (operands[0])
+   && !MEM_VOLATILE_P (operands[1])
+   && !reload_completed"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx src =  adjust_address (operands[1], V2DImode, 0);
+  emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src));
+  rtx dest = adjust_address (operands[0], V2DImode, 0);
+  emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")])
+
 ;; The patterns for LE permuted loads and stores come before the general
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
index 221d96b6cd9..b0d4811e77b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* 
powerpc*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fdump-tree-esra --param 
sra-max-scalarization-size-Ospeed=32" } */
 /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */
+/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */

 extern void abort (void);

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
index f5e6a21c2ae..2cdeae6e9e7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* 
powerpc*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fdump-tree-esra --param 
sra-max-scalarization-size-Ospeed=32" } */
 /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */
+/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */

 extern void abort (void);
 struct foo { long x; };
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
new file mode 100644
index 000..7003bdc0208
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { has_arch_pwr8 } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+/* Ensure 16-byte by pieces move is enabled.  */
+
+void move1 (void *s1, void *s2)
+{
+  __builtin_memcpy (s1, s2, 16);
+}
+
+

[PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-02-29 Thread HAO CHEN GUI
Hi,
  This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an out AND. It matches a DImode rotate and mask insert on
rs6000.

Trying 2 -> 7:
2: r122:DI=r129:DI
  REG_DEAD r129:DI
7: r125:SI=r122:DI#0 0>>0x1f
  REG_DEAD r122:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])))

This conversion blocks the further combination which combines to a SImode
rotate and mask insert insn.

Trying 9, 7 -> 10:
9: r127:SI=r130:DI#0&0xfffe
  REG_DEAD r130:DI
7: r125:SI#0=r129:DI 0>>0x1f&0x
  REG_DEAD r129:DI
   10: r124:SI=r127:SI|r125:SI
  REG_DEAD r125:SI
  REG_DEAD r127:SI
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])) 0)))
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])) 0)))

  The root cause of the issue is if it's necessary to do the widen mode for
lshiftrt when the target already has the narrow mode lshiftrt and its cost
is not high. My former patch tried to fix the problem but not accepted yet.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

  As it's stage 4 now, I drafted this patch to fix the regression by adding
subreg patterns of SImode rotate and mask insert. It actually does reversed
things and narrow the mode for lshiftrt so that it can matches the SImode
rotate and mask insert.

  The case "rlwimi-2.c" is fixed and restore the corresponding number of
insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
a regression as the total number of insns isn't changed.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Add subreg patterns for SImode rotate and mask insert

In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an AND.  The new pattern matches rotate and mask insert on
rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
and mask insert pattern.  This patch fixes the problem by adding two subreg
pattern for SImode rotate and mask insert patterns.

gcc/
PR target/93738
* config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
(*rotlsi3_insert_8): New.

gcc/testsuite/
PR target/93738
* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
rotate instructions.
* gcc.target/powerpc/rlwinm-0.c: Likewise.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..b0b40f91e3e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
 ; difference between rlwimi and rldimi.  We also might want dot forms,
 ; but not for rlwimi on POWER4 and similar processors.

+; Subreg pattern of insn "*rotlsi3_insert"
+(define_insn_and_split "*rotlsi3_insert_9"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+   (ior:SI (and:SI
+(match_operator:SI 8 "lowpart_subreg_operator"
+ [(and:DI (match_operator:DI 4 "rotate_mask_operator"
+   [(match_operand:DI 1 "gpc_reg_operand" "r")
+(match_operand:SI 2 "const_int_operand" "n")])
+  (match_operand:DI 3 "const_int_operand" "n"))])
+(match_operand:SI 5 "const_int_operand" "n"))
+   (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
+   (match_operand:SI 7 "const_int_operand" "n"]
+  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
+   && GET_CODE (operands[4]) == LSHIFTRT
+   && INTVAL (operands[3]) == 0x
+   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
+(match_dup 2))
+   (match_dup 5))
+   (and:SI (match_dup 6)
+   (match_dup 7]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[9] = gen_rtx_SUBREG (SImode, operand

Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-03 Thread HAO CHEN GUI
Hi Jeff,
  Thanks for your comments.

在 2024/3/4 6:02, Jeff Law 写道:
> Why specifically are you worried here?  Propagation of a volatile shouldn't 
> in and of itself cause a problem.  We're not changing the number of volatile 
> accesses or anything like that -- we're just moving them around a bit.

If the volatile asm operand is in a parallel set, it can't be eliminated
after the propagation. So the define insn and use insn will execute the
volatile asm block twice. That's the problem.

Here is a real case from sanitizer_linux.cpp. The insn 62 has a volatile
asm operands and it is propagated into insn 60. After propagation both
insn 60 and 62 has the volatile asm operand. Thus asm block will be
executed for twice. It causes sanitizer behaves abnormally in my test.

propagating insn 62 into insn 60, replacing:
(set (reg/v:DI 119 [ res ])
(reg:DI 133 [ res ]))
successfully matched this instruction:
(set (reg/v:DI 119 [ res ])
(asm_operands/v:DI ("mr 28, %5
mr 27, %8
mr 3, %7
mr 5, %9
mr 6, %10
mr 7, %11
li 0, %3
sc
cmpdi  cr1, 3, 0
crandc cr1*4+eq, cr1*4+eq, cr0*4+so
bne-   cr1, 1f
li29, 0
stdu  29, -8(1)
stdu  1, -%12(1)
std   2, %13(1)
mr12, 28
mtctr 12
mr3, 27
bctrl
ld2, %13(1)
li 0, %4
sc
1:
mr %0, 3
") ("=r") 0 [
(reg:SI 134)
(const_int 22 [0x16])
(const_int 120 [0x78])
(const_int 1 [0x1])
(reg/v:DI 3 3 [ __fn ])
(reg/v:DI 4 4 [ __cstack ])
(reg/v:SI 5 5 [ __flags ])
(reg/v:DI 6 6 [ __arg ])
(reg/v:DI 7 7 [ __ptidptr ])
(reg/v:DI 8 8 [ __newtls ])
(reg/v:DI 9 9 [ __ctidptr ])
(const_int 32 [0x20])
(const_int 24 [0x18])
 [
(asm_input:SI ("0") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
]
 [] 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591))
rescanning insn with uid = 60.
updating insn 60 in-place

(insn 62 61 60 6 (parallel [
(set (reg:DI 133 [ res ])
(asm_operands/v:DI ("mr 28, %5
mr 27, %8
mr 3, %7
mr 5, %9
mr 6, %10
mr 7, %11
li 0, %3
sc
cmpdi  cr1, 3, 0
crandc cr1*4+eq, cr1*4+eq, cr0*4+so
bne-   cr1, 1f
li29, 0
stdu  29, -8(1)
stdu  1, -%12(1)
std   2, %13(1)
mr12, 28
mtctr 12
mr3, 27
bctrl
ld2, %13(1)
li 0, %4
sc
1:
mr %0, 3
") ("=r") 0 [
(reg:SI 134)
(const_int 22 [0x16])
(const_int 120 [0x78])
(const_int 1 [0x1])
(reg/v:DI 3 3 [ __fn ])
(reg/v:DI 4 4 [ __cstack ])
(reg/v:SI 5 5 [ __flags ])
(reg/v:DI 6 6 [ __arg ])
(reg/v:DI 7 7 [ __ptidptr ])
(reg/v:DI 8 8 [ __newtls ])
(reg/v:DI 9 9 [ __ctidptr ])
(const_int 32 [0x20])
(const_int 24 [0x18])
]
 [
(asm_input:SI ("0") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_li

Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi Jeff,

在 2024/3/4 11:37, Jeff Law 写道:
> Can the same thing happen with a volatile memory load?  I don't think that 
> will be caught by the volatile_insn_p check.

Yes, I think so. If the define rtx contains volatile memory references, it
may hit the same problem. We may use volatile_refs_p instead of
volatile_insn_p?

Thanks
Gui Haochen


[PATCHv2] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi,
  This patch tries to fix a potential problem which is raised by the patch
for PR111267. The volatile asm operand tries to be propagated to a single
set insn with the patch for PR111267. The volatile asm operand might be
executed for multiple times if the define insn isn't eliminated after
propagation. Now set_src_cost comparison might reject such propagation.
But it has the chance to be taken after replacing set_src_cost with insn
cost. Actually I found the problem in testing my patch which replacing
set_src_cost with insn_cost in fwprop pass.

  Compared to the last version, the check volatile_insn_p is replaced with
volatile_refs_p in order to check volatile memory reference also.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646482.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
fwprop: Avoid volatile defines to be propagated

The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
which introduces an exception for propagation on single set insn.  The
propagation which might not be profitable (checked by profitable_p) is still
allowed to be propagated to single set insn.  It has a potential problem
that a volatile operand might be propagated to a single set insn.  If the
define insn is not eliminated after propagation, the volatile operand will
be executed for multiple times.  This patch fixes the problem by skipping
volatile set source rtx in propagation.

gcc/
* fwprop.cc (forward_propagate_into): Return false for volatile set
source rtx.

gcc/testsuite/
* gcc.target/powerpc/fwprop-1.c: New.

patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 7872609b336..cb6fd6700ca 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = 
false)

   rtx dest = SET_DEST (def_set);
   rtx src = SET_SRC (def_set);
+  if (volatile_refs_p (src))
+return false;

   /* Allow propagations into a loop only for reg-to-reg copies, since
  replacing one register by another shouldn't increase the cost.
diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c 
b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
new file mode 100644
index 000..07b207f980c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */
+/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */
+
+/* Verify that volatile asm operands doesn't be propagated.  */
+long long foo ()
+{
+  long long res;
+  __asm__ __volatile__(
+""
+  : "=r" (res)
+  :
+  : "memory");
+  return res;
+}



[PATCHv2, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-03-08 Thread HAO CHEN GUI
Hi,
  This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an out AND. It matches a DImode rotate and mask insert on
rs6000.

Trying 2 -> 7:
2: r122:DI=r129:DI
  REG_DEAD r129:DI
7: r125:SI=r122:DI#0 0>>0x1f
  REG_DEAD r122:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])))

This conversion blocks the further combination which combines to a SImode
rotate and mask insert insn.

Trying 9, 7 -> 10:
9: r127:SI=r130:DI#0&0xfffe
  REG_DEAD r130:DI
7: r125:SI#0=r129:DI 0>>0x1f&0x
  REG_DEAD r129:DI
   10: r124:SI=r127:SI|r125:SI
  REG_DEAD r125:SI
  REG_DEAD r127:SI
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])) 0)))
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])) 0)))

  The root cause of the issue is if it's necessary to do the widen mode for
lshiftrt when the target already has shiftrt for narrow mode and its cost
is not high. My former patch tried to fix the problem but not accepted yet.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

  As it's stage 4 now, I drafted this patch to fix the regression by adding
subreg patterns of SImode rotate and mask insert. It actually does reversed
things and narrow the mode for lshiftrt so that it can matches the SImode
rotate and mask insert.

  The case "rlwimi-2.c" is fixed and restore the corresponding number of
insns to original ones.

  Compared with last version, the main change is to remove changes for a
testcase which was already fixed in another patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Add subreg patterns for SImode rotate and mask insert

In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an AND.  The new pattern matches rotate and mask insert on
rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
and mask insert pattern.  This patch fixes the problem by adding two subreg
pattern for SImode rotate and mask insert patterns.

gcc/
PR target/93738
* config/rs6000/rs6000.md (*rotlsi3_insert_subreg): New.
(*rotlsi3_insert_4_subreg): New.

gcc/testsuite/
PR target/93738
* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
rotate instructions.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..996d0740faf 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
 ; difference between rlwimi and rldimi.  We also might want dot forms,
 ; but not for rlwimi on POWER4 and similar processors.

+; Subreg pattern of insn "*rotlsi3_insert"
+(define_insn_and_split "*rotlsi3_insert_subreg"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+   (ior:SI (and:SI
+(match_operator:SI 8 "lowpart_subreg_operator"
+ [(and:DI (match_operator:DI 4 "rotate_mask_operator"
+   [(match_operand:DI 1 "gpc_reg_operand" "r")
+(match_operand:SI 2 "const_int_operand" "n")])
+  (match_operand:DI 3 "const_int_operand" "n"))])
+(match_operand:SI 5 "const_int_operand" "n"))
+   (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
+   (match_operand:SI 7 "const_int_operand" "n"]
+  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
+   && GET_CODE (operands[4]) == LSHIFTRT
+   && INTVAL (operands[3]) == 0x
+   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
+(match_dup 2))
+   (match_dup 5))
+   (and:SI (match_dup 6)
+   (match_dup 7]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset);
+}
+  [(set_attr "type" "insert")])
+
 (define_insn "*rotl3_insert_2"
   [(set (ma

[PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-10 Thread HAO CHEN GUI
Hi,
  This patch tries to fix the problem when a canonical form doesn't benefit
on a specific target. The const operand of AND is and with the nonzero
bits of another operand in combine pass. It's a canonical form, but it's no
benefits for the target which has rotate and mask insns. As the mask is
truncated, it can't match the insn conditions which it originally matches.
For example, the following insn condition checks the sum of two AND masks.
When one of the mask is truncated, the condition breaks.

(define_insn "*rotlsi3_insert_5"
  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
(match_operand:SI 2 "const_int_operand" "n,n"))
(and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
(match_operand:SI 4 "const_int_operand" "n,n"]
  "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
   && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
   && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
...

  This patch tries to fix the problem by comparing the rtx cost. If another
operand (varop) is not changed and rtx cost with new mask is not less than
the original one, the mask is restored to original one.

  I'm not sure if comparison of rtx cost here is proper. The outer code is
unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
>From my understanding, the canonical forms should always benefit as it can't
be undo in combine pass. Do we have a perfect solution for this kind of
issues? Looking forward for your advice.

  Another similar issues for canonical forms. Whether the widen mode for
lshiftrt is always good?
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

Thanks
Gui Haochen

ChangeLog
Combine: Don't truncate const operand of AND if it's no benefits

In combine pass, the canonical form is to turn off all bits in the constant
that are know to already be zero for AND.

  /* Turn off all bits in the constant that are known to already be zero.
 Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
 which is tested below.  */

  constop &= nonzero;

But it doesn't benefit when the target has rotate and mask insert insns.
The AND mask is truncated and lost its information.  Thus it can't match
the insn conditions.  For example, the following insn condition checks
the sum of two AND masks.

(define_insn "*rotlsi3_insert_5"
  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
(match_operand:SI 2 "const_int_operand" "n,n"))
(and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
(match_operand:SI 4 "const_int_operand" "n,n"]
  "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
   && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
   && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
...

This patch restores the const operand of AND if the another operand is
not optimized and the truncated const operand doesn't save the rtx cost.

gcc/
* combine.cc (simplify_and_const_int_1): Restore the const operand
of AND if varop is not optimized and the rtx cost of the new const
operand is not reduced.

gcc/testsuite/
* gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
adjust the number of rotate and mask insns.
* gcc.target/powerpc/rlwimi-1.c: Likewise.
* gcc.target/powerpc/rlwimi-2.c: Likewise.

patch.diff
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..16ff09ea854 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
varop,
   if (constop == nonzero)
 return varop;

-  if (varop == orig_varop && constop == orig_constop)
-return NULL_RTX;
+  if (varop == orig_varop)
+{
+  if (constop == orig_constop)
+   return NULL_RTX;
+  else
+   {
+ rtx tmp = simplify_gen_binary (AND, mode, varop,
+gen_int_mode (constop, mode));
+ rtx orig = simplify_gen_binary (AND, mode, varop,
+ gen_int_mode (orig_constop, mode));
+ if (set_src_cost (tmp, mode, optimize_this_for_speed_p)
+ < set_src_cost (orig, mode, optimize_this_for_speed_p))
+   return tmp;
+ else
+   return NULL_RTX;
+   }
+}

   /* Otherwise, return an AND.  */
   return simplify_gen_binary (AND, mode, varop, gen_int_mode (constop, mode));
diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c 
b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c
index 961be199901..d9dd4419f1d 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c
@@ -2,15 +2,15 @@
 /* { dg-options "-O2" } */

 /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]

Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-18 Thread HAO CHEN GUI
Hi,
  Gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html

Thanks
Gui Haochen

在 2024/3/11 13:41, HAO CHEN GUI 写道:
> Hi,
>   This patch tries to fix the problem when a canonical form doesn't benefit
> on a specific target. The const operand of AND is and with the nonzero
> bits of another operand in combine pass. It's a canonical form, but it's no
> benefits for the target which has rotate and mask insns. As the mask is
> truncated, it can't match the insn conditions which it originally matches.
> For example, the following insn condition checks the sum of two AND masks.
> When one of the mask is truncated, the condition breaks.
> 
> (define_insn "*rotlsi3_insert_5"
>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>   (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>   (match_operand:SI 2 "const_int_operand" "n,n"))
>   (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>   (match_operand:SI 4 "const_int_operand" "n,n"]
>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
> ...
> 
>   This patch tries to fix the problem by comparing the rtx cost. If another
> operand (varop) is not changed and rtx cost with new mask is not less than
> the original one, the mask is restored to original one.
> 
>   I'm not sure if comparison of rtx cost here is proper. The outer code is
> unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
> From my understanding, the canonical forms should always benefit as it can't
> be undo in combine pass. Do we have a perfect solution for this kind of
> issues? Looking forward for your advice.
> 
>   Another similar issues for canonical forms. Whether the widen mode for
> lshiftrt is always good?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> Combine: Don't truncate const operand of AND if it's no benefits
> 
> In combine pass, the canonical form is to turn off all bits in the constant
> that are know to already be zero for AND.
> 
>   /* Turn off all bits in the constant that are known to already be zero.
>  Thus, if the AND isn't needed at all, we will have CONSTOP == 
> NONZERO_BITS
>  which is tested below.  */
> 
>   constop &= nonzero;
> 
> But it doesn't benefit when the target has rotate and mask insert insns.
> The AND mask is truncated and lost its information.  Thus it can't match
> the insn conditions.  For example, the following insn condition checks
> the sum of two AND masks.
> 
> (define_insn "*rotlsi3_insert_5"
>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>   (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>   (match_operand:SI 2 "const_int_operand" "n,n"))
>   (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>   (match_operand:SI 4 "const_int_operand" "n,n"]
>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
> ...
> 
> This patch restores the const operand of AND if the another operand is
> not optimized and the truncated const operand doesn't save the rtx cost.
> 
> gcc/
>   * combine.cc (simplify_and_const_int_1): Restore the const operand
>   of AND if varop is not optimized and the rtx cost of the new const
>   operand is not reduced.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
>   adjust the number of rotate and mask insns.
>   * gcc.target/powerpc/rlwimi-1.c: Likewise.
>   * gcc.target/powerpc/rlwimi-2.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index a4479f8d836..16ff09ea854 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
> varop,
>if (constop == nonzero)
>  return varop;
> 
> -  if (varop == orig_varop && constop == orig_constop)
> -return NULL_RTX;
> +  if (varop == orig_varop)
> +{
> +  if (constop == orig_constop)
> + return NULL_RTX;
&g

[PATCH] Value Range: Add range op for builtin isinf

2024-03-24 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op fro isinf is needed for value range analysis.  This
patch adds range op for builtin isinf.

gcc/
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index a98f7db62a7..9de130b4022 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1140,6 +1140,57 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cnf_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (&op1.lower_bound ())
+   && !real_isinf (&op1.upper_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+return false;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (&lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+return false;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1232,6 +1283,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+



[patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode [PR97786]

2024-03-24 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isinf for SF/DF/TFmode by rs6000 test
data class instructions.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isinf for SFmode, DFmode and TFmode

gcc/
PR target/97786
* config/rs6000/vsx.md (isinf2): New expand for SFmode and
DFmode.
(isinf2): New expand for TFmode.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-1.c: New test.
* gcc.target/powerpc/pr97786-2.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..f0cc02f7e7b 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5313,6 +5313,26 @@ (define_expand "xststdcp"
   operands[4] = CONST0_RTX (SImode);
 })

+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
new file mode 100644
index 000..1b1e6d642de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */
+
+int test1 (double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isinf (x);
+}
+
+int test3 (float x)
+{
+  return __builtin_isinff (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
new file mode 100644
index 000..de7f2d67c4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786]

2024-03-27 Thread HAO CHEN GUI
Hi,
  This patch folds builtin_isinf on IBM long double to builtin_isinf on
double type. The former patch
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html
implemented the DFmode isinf_optab.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen

ChangeLog
Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double

For IBM long double, Inf is encoded in the high-order double value only.
So the builtin_isinf on IBM long double can be folded to builtin_isinf on
double type.  As former patch implemented DFmode isinf_optab, this patch
converts builtin_isinf on IBM long double to builtin_isinf on double type
if the DFmode isinf_optab exists.

gcc/
PR target/97786
* builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double
isinf call to double isinf call when DFmode isinf_optab exists.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-3.c: New test.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eda8bea9c4b..d2786f207b8 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9574,6 +9574,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree 
fndecl, tree arg)
type = double_type_node;
mode = DFmode;
arg = fold_build1_loc (loc, NOP_EXPR, type, arg);
+   tree const isinf_fn = builtin_decl_explicit (BUILT_IN_ISINF);
+   if (interclass_mathfn_icode (arg, isinf_fn) != CODE_FOR_nothing)
+ {
+   result = build_call_expr (isinf_fn, 1, arg);
+   return result;
+ }
  }
get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false);
real_from_string (&r, buf);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
new file mode 100644
index 000..1c816921e1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 2 } } */


[Patch, rs6000] Enable overlap memory store for block memory clear

2024-02-25 Thread HAO CHEN GUI
Hi,
  This patch enables overlap memory store for block memory clear which
saves the number of store instructions. The expander calls
widest_fixed_size_mode_for_block_clear to get the mode for looped block
clear and calls widest_fixed_size_mode_for_block_clear to get the mode
for last overlapped clear.

Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk or next stage 1?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable overlap memory store for block memory clear

gcc/
* config/rs6000/rs6000-string.cc
(widest_fixed_size_mode_for_block_clear): New.
(smallest_fixed_size_mode_for_block_clear): New.
(expand_block_clear): Call widest_fixed_size_mode_for_block_clear to
get the mode for looped memory stores and call
smallest_fixed_size_mode_for_block_clear to get the mode for the last
overlapped memory store.

gcc/testsuite
* gcc.target/powerpc/block-clear-1.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 133e5382af2..c2a6095a586 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -38,6 +38,49 @@
 #include "profile-count.h"
 #include "predict.h"

+/* Return the widest mode which mode size is less than or equal to the
+   size.  */
+static fixed_size_mode
+widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int align,
+   bool unaligned_vsx_ok)
+{
+  machine_mode mode;
+
+  if (TARGET_ALTIVEC
+  && size >= 16
+  && (align >= 128
+ || unaligned_vsx_ok))
+mode = V4SImode;
+  else if (size >= 8
+  && TARGET_POWERPC64
+  && (align >= 64
+  || !STRICT_ALIGNMENT))
+mode = DImode;
+  else if (size >= 4
+  && (align >= 32
+  || !STRICT_ALIGNMENT))
+mode = SImode;
+  else if (size >= 2
+  && (align >= 16
+  || !STRICT_ALIGNMENT))
+mode = HImode;
+  else
+mode = QImode;
+
+  return as_a  (mode);
+}
+
+/* Return the smallest mode which mode size is smaller than or eqaul to
+   the size.  */
+static fixed_size_mode
+smallest_fixed_size_mode_for_block_clear (unsigned int size)
+{
+  if (size > UNITS_PER_WORD)
+return as_a  (V4SImode);
+
+  return smallest_int_mode_for_size (size * BITS_PER_UNIT);
+}
+
 /* Expand a block clear operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.

@@ -55,7 +98,6 @@ expand_block_clear (rtx operands[])
   HOST_WIDE_INT align;
   HOST_WIDE_INT bytes;
   int offset;
-  int clear_bytes;
   int clear_step;

   /* If this is not a fixed size move, just call memcpy */
@@ -89,62 +131,36 @@ expand_block_clear (rtx operands[])

   bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);

-  for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
+  auto mode = widest_fixed_size_mode_for_block_clear (bytes, align,
+ unaligned_vsx_ok);
+  offset = 0;
+  rtx dest;
+
+  do
 {
-  machine_mode mode = BLKmode;
-  rtx dest;
+  unsigned int size = GET_MODE_SIZE (mode);

-  if (TARGET_ALTIVEC
- && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok)))
+  while (bytes >= size)
{
- clear_bytes = 16;
- mode = V4SImode;
-   }
-  else if (bytes >= 8 && TARGET_POWERPC64
-  && (align >= 64 || !STRICT_ALIGNMENT))
-   {
- clear_bytes = 8;
- mode = DImode;
- if (offset == 0 && align < 64)
-   {
- rtx addr;
+ dest = adjust_address (orig_dest, mode, offset);
+ emit_move_insn (dest, CONST0_RTX (mode));

- /* If the address form is reg+offset with offset not a
-multiple of four, reload into reg indirect form here
-rather than waiting for reload.  This way we get one
-reload, not one per store.  */
- addr = XEXP (orig_dest, 0);
- if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
- && CONST_INT_P (XEXP (addr, 1))
- && (INTVAL (XEXP (addr, 1)) & 3) != 0)
-   {
- addr = copy_addr_to_reg (addr);
- orig_dest = replace_equiv_address (orig_dest, addr);
-   }
-   }
-   }
-  else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT))
-   {   /* move 4 bytes */
- clear_bytes = 4;
- mode = SImode;
-   }
-  else if (bytes >= 2 && (align >= 16 || !STRICT_ALIGNMENT))
-   {   /* move 2 bytes */
- clear_bytes = 2;
- mode = HImode;
-   }
-  else /* move 1 byte at a time */
-   {
- clear_bytes = 1;
- mode = QImode;
+ offset += size;
+ bytes -= size;
}

-  dest = adjust_

[PATCH] fwprop: Avoid volatile defines to be propagated

2024-02-25 Thread HAO CHEN GUI
Hi,
  This patch tries to fix a potential problem which is raised by the patch
for PR111267. The volatile asm operand tries to be propagated to a single
set insn with the patch for PR111267. It has potential risk as the behavior
is wrong. Currently set_src_cost comparison can reject such propagation.
But the propagation might be taken after replacing set_src_cost with insn
cost. Actually I found the problem in testing my patch which replacing
et_src_cost with insn cost for fwprop.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
fwprop: Avoid volatile defines to be propagated

The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
which introduces an exception for propagation on single set insn.  The
propagation which might not be profitable (checked by profitable_p) is still
allowed to be propagated to single set insn.  It has a potential problem
that a volatile asm operand will try to be propagated to a single set insn.
The volatile asm operand is originally banned in profitable_p.  This patch
fixes the problem by skipping volatile set source in define set finding.

gcc/
* fwprop.cc (forward_propagate_into): Return false for volatile set
source.

gcc/testsuite/
* gcc.target/powerpc/fwprop-1.c: New.

patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 7872609b336..89dce88b43d 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = 
false)

   rtx dest = SET_DEST (def_set);
   rtx src = SET_SRC (def_set);
+  if (volatile_insn_p (src))
+return false;

   /* Allow propagations into a loop only for reg-to-reg copies, since
  replacing one register by another shouldn't increase the cost.
diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c 
b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
new file mode 100644
index 000..07b207f980c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */
+/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */
+
+/* Verify that volatile asm operands doesn't try to be propagated.  */
+long long foo ()
+{
+  long long res;
+  __asm__ __volatile__(
+""
+  : "=r" (res)
+  :
+  : "memory");
+  return res;
+}



[PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-09 Thread HAO CHEN GUI
Hi,
  This patch refactors function expand_compare_loop and split it to two
functions. One is for fixed length and another is for variable length.
These two functions share some low level common help functions.

  Besides above changes, the patch also does:
1. Don't generate load and compare loop when max_bytes is less than
loop bytes.
2. Remove do_load_mask_compare as it's no needed. All sub-targets
entering the function should support efficient overlapping load and
compare.
3. Implement an variable length overlapping load and compare for the
case which remain bytes is less than the loop bytes in variable length
compare. The 4k boundary test and one-byte load and compare loop are
removed as they're no need now.
4. Remove the codes for "bytes > max_bytes" with fixed length as the
case is already excluded by pre-checking.
5. Remove running time codes for "bytes > max_bytes" with variable length
as it should jump to call library at the beginning.
6. Enhance do_overlap_load_compare to avoid overlapping load and compare
when the remain bytes can be loaded and compared by a smaller unit.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Refactor expand_compare_loop and split it to two functions

The original expand_compare_loop has a complicated logical as it's
designed for both fixed and variable length.  This patch splits it to
two functions and make these two functions share common help functions.
Also the 4K boundary test and corresponding one byte load and compare
are replaced by variable length overlapping load and compare.  The
do_load_mask_compare is removed as all sub-targets entering the function
has efficient overlapping load and compare so that mask load is no needed.

gcc/
* config/rs6000/rs6000-string.cc (do_isel): Remove.
(do_load_mask_compare): Remove.
(do_reg_compare): New.
(do_load_and_compare): New.
(do_overlap_load_compare): Do load and compare with a small unit
other than overlapping load and compare when the remain bytes can
be done by one instruction.
(expand_compare_loop): Remove.
(get_max_inline_loop_bytes): New.
(do_load_compare_rest_of_loop): New.
(generate_6432_conversion): Set it to a static function and move
ahead of gen_diff_handle.
(gen_diff_handle): New.
(gen_load_compare_loop): New.
(gen_library_call): New.
(expand_compare_with_fixed_length): New.
(expand_compare_with_variable_length): New.
(expand_block_compare): Call expand_compare_with_variable_length
to expand block compare for variable length.  Call
expand_compare_with_fixed_length to expand block compare loop for
fixed length.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-5.c: New.
* gcc.target/powerpc/block-cmp-6.c: New.
* gcc.target/powerpc/block-cmp-7.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index f707bb2727e..018b87f2501 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -404,21 +404,6 @@ do_ifelse (machine_mode cmpmode, rtx_code comparison,
   LABEL_NUSES (true_label) += 1;
 }

-/* Emit an isel of the proper mode for DEST.
-
-   DEST is the isel destination register.
-   SRC1 is the isel source if CR is true.
-   SRC2 is the isel source if CR is false.
-   CR is the condition for the isel.  */
-static void
-do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr)
-{
-  if (GET_MODE (dest) == DImode)
-emit_insn (gen_isel_cc_di (dest, cmp, src_t, src_f, cr));
-  else
-emit_insn (gen_isel_cc_si (dest, cmp, src_t, src_f, cr));
-}
-
 /* Emit a subtract of the proper mode for DEST.

DEST is the destination register for the subtract.
@@ -499,65 +484,61 @@ do_rotl3 (rtx dest, rtx src1, rtx src2)
 emit_insn (gen_rotlsi3 (dest, src1, src2));
 }

-/* Generate rtl for a load, shift, and compare of less than a full word.
-
-   LOAD_MODE is the machine mode for the loads.
-   DIFF is the reg for the difference.
-   CMP_REM is the reg containing the remaining bytes to compare.
-   DCOND is the CCUNS reg for the compare if we are doing P9 code with setb.
-   SRC1_ADDR is the first source address.
-   SRC2_ADDR is the second source address.
-   ORIG_SRC1 is the original first source block's address rtx.
-   ORIG_SRC2 is the original second source block's address rtx.  */
+/* Do the compare for two registers.  */
 static void
-do_load_mask_compare (const machine_mode load_mode, rtx diff, rtx cmp_rem, rtx 
dcond,
- rtx src1_addr, rtx src2_addr, rtx orig_src1, rtx 
orig_src2)
+do_reg_compare (bool use_vec, rtx vec_result, rtx diff, rtx *dcond, rtx d1,
+   rtx d2)
 {
-  HOST_WIDE_INT load_mode_size = GET_MODE_SIZE (load_mode);
-  rtx shift_amount = gen_reg_rtx (word_mode);
-  rtx d1 = gen_reg_rtx (wor

[Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi,
  This patch eliminates unnecessary byte swaps for block clear on P8
LE. For block clear, all the bytes are set to zero. The byte order
doesn't make sense. So the alignment of destination could be set to
the store mode size in stead of 1 byte in order to eliminates
unnecessary byte swap instructions on P8 LE. The test case shows the
problem.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE

gcc/
PR target/113325
* config/rs6000/rs6000-string.cc (expand_block_clear): Set the
alignment of destination to the size of mode.

gcc/testsuite/
PR target/113325
* gcc.target/powerpc/pr113325.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 7f777666ba9..4c9b2cbeefc 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
}

   dest = adjust_address (orig_dest, mode, offset);
-
+  /* Set the alignment of dest to the size of mode in order to
+avoid unnecessary byte swaps on LE.  */
+  set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
   emit_move_insn (dest, CONST0_RTX (mode));
 }

diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 000..4a3cae019c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}


Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi Richard,
   Thanks so much for your comments.


>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000-string.cc 
>> b/gcc/config/rs6000/rs6000-string.cc
>> index 7f777666ba9..4c9b2cbeefc 100644
>> --- a/gcc/config/rs6000/rs6000-string.cc
>> +++ b/gcc/config/rs6000/rs6000-string.cc
>> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
>> }
>>
>>dest = adjust_address (orig_dest, mode, offset);
>> -
>> +  /* Set the alignment of dest to the size of mode in order to
>> +avoid unnecessary byte swaps on LE.  */
>> +  set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
> 
> but the alignment is now wrong which might cause ripple-down
> wrong-code effects, no?
> 
> It's probably bad to hide the byte-swapping in the move patterns (I'm
> just guessing
> you do that)

Here I just change the alignment of "dest" which is temporary used for
move. The orig_dest is untouched and keep the original alignment. The
subsequent insns which use orig_dest are not affected. I am not sure if
it causes ripple-down effects. Do you mean the dest might be reused
later? But I think the alignment is different even though the mode and
offset is the same.

Looking forward to your advice.

Thanks
Gui Haochen


[PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64

2024-01-11 Thread HAO CHEN GUI
Hi,
  On P9 "setb" is used to set the result of block compare. So it works
with m32 and mpowerpc64. On P8, carry bit is used. So it can't work
with m32 and mpowerpc64. This patch enables block compare expand for
m32 and mpowerpc64 on P9.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable block compare expand on P9 with m32 and mpowerpc64

gcc/
* config/rs6000/rs6000-string.cc (expand_block_compare): Enable
P9 with m32 and mpowerpc64.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: Exclude m32 and mpowerpc64.
* gcc.target/powerpc/block-cmp-4.c: Likewise.
* gcc.target/powerpc/block-cmp-8.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 018b87f2501..346708071b5 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1677,11 +1677,12 @@ expand_block_compare (rtx operands[])
   /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
   gcc_assert (TARGET_POPCNTD);

-  /* This case is complicated to handle because the subtract
- with carry instructions do not generate the 64-bit
- carry and so we must emit code to calculate it ourselves.
- We choose not to implement this yet.  */
-  if (TARGET_32BIT && TARGET_POWERPC64)
+  /* For P8, this case is complicated to handle because the subtract
+ with carry instructions do not generate the 64-bit carry and so
+ we must emit code to calculate it ourselves.  We skip it on P8
+ but setb works well on P9.  */
+  if (TARGET_32BIT && TARGET_POWERPC64
+  && !TARGET_P9_MISC)
 return false;

   /* Allow this param to shut off all expansion.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
index bcf0cb2ab4f..cd076cf1dce 100644
--- a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-vsx" } */
+/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */
 /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */

 /* Test that it still can do expand for memcmpsi instead of calling library
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
index c86febae68a..9373b53a3a4 100644
--- a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target be } } */
 /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */
 /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */

 /* Test that it does expand for memcmpsi instead of calling library on
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
new file mode 100644
index 000..b470f873973
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
@@ -0,0 +1,8 @@
+/* { dg-do run { target ilp32 } } */
+/* { dg-options "-O2 -m32 -mpowerpc64" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+/* { dg-timeout-factor 2 } */
+
+/* Verify memcmp on m32 mpowerpc64 */
+
+#include "../../gcc.dg/memcmp-1.c"


Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-15 Thread HAO CHEN GUI
Hi Kewen,

在 2024/1/15 14:16, Kewen.Lin 写道:
> Considering it's stage 4 now and the impact of this patch, let's defer
> this to next stage 1, if possible could you organize the above changes
> into patches:
> 
> 1) Refactor expand_compare_loop by splitting into two functions without
>any functional changes.
> 2) Remove some useless codes like 2, 4, 5.
> 3) Some more enhancements like 1, 3, 6.
> 
> ?  It would be helpful for the review.  Thanks!

Thanks for your review comments. I will re-organize it at new stage 1.


[PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-01-15 Thread HAO CHEN GUI
Hi,
  This patch adds const0 move checking for CLEAR_BY_PIECES. The original
vec_duplicate handles duplicates of non-constant inputs. But 0 is a
constant. So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move by that mode.

  The test cases will be added in subsequent target specific patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
expand: Add const0 move checking for CLEAR_BY_PIECES optabs

vec_duplicate handles duplicates of non-constant inputs.  The 0 is a
constant.  So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move.  This patch adds
the checking.

gcc/
* expr.cc (by_pieces_mode_supported_p): Add const0 move checking
for CLEAR_BY_PIECES.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 34f5ff90a9f..cd960349a53 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1006,14 +1006,21 @@ can_use_qi_vectors (by_pieces_operation op)
 static bool
 by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
-  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+  enum insn_code icode = optab_handler (mov_optab, mode);
+  if (icode == CODE_FOR_nothing)
 return false;

-  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
+  if (op == SET_BY_PIECES
   && VECTOR_MODE_P (mode)
   && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
 return false;

+  if (op == CLEAR_BY_PIECES
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing
+  && !insn_operand_matches (icode, 1, CONST0_RTX (mode)))
+return false;
+
   if (op == COMPARE_BY_PIECES
   && !can_compare_p (EQ, mode, ccp_jump))
 return false;


[PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi,
  This patch replaces rtx_cost with insn_cost in forward propagation.
In the PR, one constant vector should be propagated and replace a
pseudo in a store insn if we know it's a duplicated constant vector.
It reduces the insn cost but not rtx cost. In this case, the kind of
destination operand (memory or pseudo) decides the cost and rtx cost
can't reflect it.

  The test case is added in the second target specific patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen


ChangeLog
fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern

gcc/
PR target/113325
* fwprop.cc (try_fwprop_subst_pattern): Replace rtx_cost with
insn_cost.


patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 0707a234726..b05b2538edc 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -467,20 +467,17 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   redo_changes (0);
 }

-  /* ??? In theory, it should be better to use insn costs rather than
- set_src_costs here.  That would involve replacing this code with
- change_is_worthwhile.  */
   bool ok = recog (attempt, use_change);
   if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
-if (rtx use_set = single_set (use_rtl))
+if (single_set (use_rtl))
   {
bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl));
+   auto new_cost = insn_cost (use_rtl, speed);
temporarily_undo_changes (0);
-   auto old_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
+   /* Invalidate recog data.  */
+   INSN_CODE (use_rtl) = -1;
+   auto old_cost = insn_cost (use_rtl, speed);
redo_changes (0);
-   auto new_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
if (new_cost > old_cost)
  {
if (dump_file)


[Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi,
  This patch creates an insn_and_split pattern which helps the duplicated
constant vector replace the source pseudo of store insn in fwprop pass.
Thus the store can be implemented by a single stxvd2x and it eliminates the
unnecessary byte swap insn on P8 LE. The test case shows the optimization.

  The patch depends on the first generic patch which uses insn cost in fwprop.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store

gcc/
PR target/113325
* config/rs6000/predicates.md (duplicate_easy_altivec_constant): New.
* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New.

gcc/testsuite/
PR target/113325
* gcc.target/powerpc/pr113325.c: New.


patch.diff
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index ef7d3f214c4..8ab6db630b7 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -759,6 +759,14 @@ (define_predicate "easy_vector_constant"
   return false;
 })

+;; Return 1 if it's a duplicated easy_altivec_constant.
+(define_predicate "duplicate_easy_altivec_constant"
+  (and (match_code "const_vector")
+   (match_test "easy_altivec_constant (op, mode)"))
+{
+  return const_vec_duplicate_p (op);
+})
+
 ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF.
 (define_predicate "easy_vector_constant_add_self"
   (and (match_code "const_vector")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 26fa32829af..98e4be26f64 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3362,6 +3362,29 @@ (define_insn "*vsx_stxvd2x4_le_"
   "stxvd2x %x1,%y0"
   [(set_attr "type" "vecstore")])

+(define_insn_and_split "vsx_stxvd2x4_le_const_"
+  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+   (match_operand:VSX_W 1 "duplicate_easy_altivec_constant" "W"))]
+  "!BYTES_BIG_ENDIAN
+   && VECTOR_MEM_VSX_P (mode)
+   && !TARGET_P9_VECTOR"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+   (match_dup 1))
+   (set (match_dup 0)
+   (vec_select:VSX_W
+ (match_dup 2)
+ (parallel [(const_int 2) (const_int 3)
+(const_int 0) (const_int 1)])))]
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+: operands[1];
+
+}
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "8")])
+
 (define_insn "*vsx_stxvd2x8_le_V8HI"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
 (vec_select:V8HI
diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 000..dff68ac0a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}


[Patchv2, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-17 Thread HAO CHEN GUI
Hi,
  The patch corrects the definition of
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of
slow_unaligned_access.

  Compared with last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640076.html
the main change is to replace the macro with slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Correct definition of macro of fixed point efficient unaligned

Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
guard the platform which is efficient on fixed point unaligned load/store.
It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
from P8 and can be disabled by mno-vsx option. So the definition is wrong.
This patch corrects the problem and call slow_unaligned_access to judge if
fixed point unaligned load/store is efficient or not.

gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Remove.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
targetm.slow_unaligned_access.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: New.
* gcc.target/powerpc/block-cmp-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 44a946cd453..cb9eeef05d8 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   else if (bytes == GET_MODE_SIZE (QImode))
 return QImode;
   else if (bytes < GET_MODE_SIZE (SImode)
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (SImode, align)
   && offset >= GET_MODE_SIZE (SImode) - bytes)
 /* This matches the case were we have SImode and 3 bytes
and offset >= 1 and permits us to move back one and overlap
@@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
unwanted bytes off of the input.  */
 return SImode;
   else if (word_mode_ok && bytes < UNITS_PER_WORD
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (word_mode, align)
   && offset >= UNITS_PER_WORD-bytes)
 /* Similarly, if we can use DImode it will get matched here and
can do an overlapping read that ends at the end of the block.  */
@@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode, align))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
@@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[])
   /* The code generated for p7 and older is not faster than glibc
  memcmp if alignment is small and length is not short, so bail
  out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx))
   && ((base_align == 1 && bytes > 16)
  || (base_align == 2 && bytes > 32)))
 return false;
@@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes_to_compare >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode, align))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 326c45221e9..3971a56c588 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -483,10 +483,6 @@ extern int rs6000_vector_align[];
 #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT
 #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT)

-/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
-   loads are slow. */
-#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
-
 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features.  TImode syncs need quad
memory support.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
new file mode 100644
index 000..bcf0cb2ab4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-

[Patchv2, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-17 Thread HAO CHEN GUI
Hi,
  This patch cleans up pre-checkings of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Return false when optimizing for size.
3. Remove P7 processor test as only P7 above can enter this function and
P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the
performance of expand is better than the performance of library when
the length is long.

  Compared to last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640082.html
the main change is to add some comments and move the variable definition
closed to its use.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up the pre-checkings of expand_block_compare

gcc/
* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function.  Return false (call library)
when it's optimized for size.  Remove P7 CPU test as only P7 above
can enter this function and P7 LE is excluded by the checking of
targetm.slow_unaligned_access on word_mode.  Also performance test
shows the expand of block compare with 16 bytes to 64 bytes length
is better than library on P7 BE.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-3.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index cb9eeef05d8..49670cef4d7 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1946,36 +1946,32 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
-  rtx target = operands[0];
-  rtx orig_src1 = operands[1];
-  rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
+  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POPCNTD);

-  /* This case is complicated to handle because the subtract
- with carry instructions do not generate the 64-bit
- carry and so we must emit code to calculate it ourselves.
- We choose not to implement this yet.  */
-  if (TARGET_32BIT && TARGET_POWERPC64)
+  if (optimize_insn_for_size_p ())
 return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
   /* Allow this param to shut off all expansion.  */
   if (rs6000_block_compare_inline_limit == 0)
 return false;

-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
- However slow_unaligned_access returns true on P7 even though the
- performance of this code is good there.  */
-  if (!isP7
-  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
- || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
+  /* This case is complicated to handle because the subtract
+ with carry instructions do not generate the 64-bit
+ carry and so we must emit code to calculate it ourselves.
+ We choose not to implement this yet.  */
+  if (TARGET_32BIT && TARGET_POWERPC64)
 return false;

-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
- not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];
+
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+   || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
 return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2023,14 +2019,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
 return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
- memcmp if alignment is small and length is not short, so bail
- out to avoid those conditions.  */
-  if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx))
-  && ((base_align == 1 && bytes > 16)
- || (base_align == 2 && bytes > 32)))
-return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
new file mode 100644
index 000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}


[Patchv3, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-20 Thread HAO CHEN GUI
Hi,
  The patch corrects the definition of
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of
slow_unaligned_access.

  Compared with last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640832.html
the main change is to pass alignment measured by bits to
slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Correct definition of macro of fixed point efficient unaligned

Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
guard the platform which is efficient on fixed point unaligned load/store.
It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
from P8 and can be disabled by mno-vsx option. So the definition is wrong.
This patch corrects the problem and call slow_unaligned_access to judge if
fixed point unaligned load/store is efficient or not.

gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Remove.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
targetm.slow_unaligned_access.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: New.
* gcc.target/powerpc/block-cmp-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 44a946cd453..05dc41622f4 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   else if (bytes == GET_MODE_SIZE (QImode))
 return QImode;
   else if (bytes < GET_MODE_SIZE (SImode)
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (SImode, align * BITS_PER_UNIT)
   && offset >= GET_MODE_SIZE (SImode) - bytes)
 /* This matches the case were we have SImode and 3 bytes
and offset >= 1 and permits us to move back one and overlap
@@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
unwanted bytes off of the input.  */
 return SImode;
   else if (word_mode_ok && bytes < UNITS_PER_WORD
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (word_mode, align * BITS_PER_UNIT)
   && offset >= UNITS_PER_WORD-bytes)
 /* Similarly, if we can use DImode it will get matched here and
can do an overlapping read that ends at the end of the block.  */
@@ -1749,7 +1749,8 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode,
+  align * BITS_PER_UNIT))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
@@ -2026,7 +2027,7 @@ expand_block_compare (rtx operands[])
   /* The code generated for p7 and older is not faster than glibc
  memcmp if alignment is small and length is not short, so bail
  out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT)
   && ((base_align == 1 && bytes > 16)
  || (base_align == 2 && bytes > 32)))
 return false;
@@ -2168,7 +2169,8 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes_to_compare >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode,
+  align * BITS_PER_UNIT))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 326c45221e9..3971a56c588 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -483,10 +483,6 @@ extern int rs6000_vector_align[];
 #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT
 #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT)

-/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
-   loads are slow. */
-#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
-
 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features.  TImode syncs need quad
memory support.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/bl

[Patch, rs6000] Call library for block memory compare when optimizing for size

2023-12-20 Thread HAO CHEN GUI
Hi,
  This patch call library function for block memory compare when it's
optimized for size.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Call library for block memory compare when optimizing for size

gcc/
* config/rs6000/rs6000-string.cc (expand_block_compare): Return
false when optimizing for size.

gcc/testsuite/
* gcc.target/powerpc/block-cm-3.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 05dc41622f4..5149273b80e 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1947,6 +1947,9 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  if (optimize_insn_for_size_p ())
+return false;
+
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
new file mode 100644
index 000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}



[Patchv3, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-20 Thread HAO CHEN GUI
Hi,
  This patch cleans up pre-checkings of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Remove P7 processor test as only P7 above can enter this function and
P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the
performance of expand is better than the performance of library when
the length is long.

  Compared to last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640833.html
the main change is to split optimization for size to a separate patch
and add a testcase for P7 BE.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up the pre-checkings of expand_block_compare

Remove P7 CPU test as only P7 above can enter this function and P7 LE is
excluded by the checking of targetm.slow_unaligned_access on word_mode.
Also performance test shows the expand of block compare is better than
library on P7 BE when the length is from 16 bytes to 64 bytes.

gcc/
* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function.  Remove P7 CPU test and let
P7 BE do the expand.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-4.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 5149273b80e..09db57255fa 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1947,15 +1947,12 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POPCNTD);
+
   if (optimize_insn_for_size_p ())
 return false;

-  rtx target = operands[0];
-  rtx orig_src1 = operands[1];
-  rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
-
   /* This case is complicated to handle because the subtract
  with carry instructions do not generate the 64-bit
  carry and so we must emit code to calculate it ourselves.
@@ -1963,23 +1960,19 @@ expand_block_compare (rtx operands[])
   if (TARGET_32BIT && TARGET_POWERPC64)
 return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
   /* Allow this param to shut off all expansion.  */
   if (rs6000_block_compare_inline_limit == 0)
 return false;

-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
- However slow_unaligned_access returns true on P7 even though the
- performance of this code is good there.  */
-  if (!isP7
-  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
- || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
-return false;
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];

-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
- not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+  if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+  || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
 return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2027,14 +2020,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
 return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
- memcmp if alignment is small and length is not short, so bail
- out to avoid those conditions.  */
-  if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT)
-  && ((base_align == 1 && bytes > 16)
- || (base_align == 2 && bytes > 32)))
-return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
new file mode 100644
index 000..c86febae68a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target be } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+/* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */
+
+/* Test that it does expand for memcmpsi instead of calling library on
+   P7 BE when length is less than 32 bytes.  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 31);
+}


[patch-2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-11-30 Thread HAO CHEN GUI
Hi,
  The "fctid" is supported on 64-bit Power processors and powerpc 476. It
need a guard to check it. The patch fixes the issue.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: guard fctid on PPC64 and powerpc 476

fctid is supported on 64-bit Power processors and powerpc 476. It should
be guarded by this condition. The patch fixes the issue.

gcc/
PR target/112707
* config/rs6000/rs6000.h (TARGET_FCTID): Define.
* config/rs6000/rs6000.md (lrintdi2): Add guard TARGET_FCTID.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707.h: New.
* gcc.target/powerpc/pr112707-2.c: New.
* gcc.target/powerpc/pr112707-3.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 22595f6..497ae3d 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -467,6 +467,8 @@ extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
 #define TARGET_FCTIWUZ TARGET_POPCNTD
+/* Enable fctid on ppc64 and powerpc476.  */
+#define TARGET_FCTID   (TARGET_POWERPC64 | TARGET_FPRND)
 #define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 #define TARGET_MADDLD  TARGET_MODULO
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d4337ce..4a5e63c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6718,7 +6718,7 @@ (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
(unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_FCTID"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
new file mode 100644
index 000..ae91913
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { powerpc*-*-* && be } } } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-final { scan-assembler-not {\mfctid\M} } }  */
+
+#include "pr112707.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
new file mode 100644
index 000..e47ce20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { powerpc*-*-* && be } } } */
+/* { dg-options "-O2 -m32 -fno-math-errno -mdejagnu-cpu=476fp" } */
+/* { dg-require-effective-target ilp32 } */
+
+/* powerpc 476fp has hard float enabled which is required by fctid */
+
+#include "pr112707.h"
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707.h 
b/gcc/testsuite/gcc.target/powerpc/pr112707.h
new file mode 100644
index 000..e427dc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707.h
@@ -0,0 +1,10 @@
+long long test1 (double a)
+{
+  return __builtin_llrint (a);
+}
+
+long long test2 (float a)
+{
+  return __builtin_llrint (a);
+}
+


[patch-1, rs6000] enable fctiw on old archs [PR112707]

2023-11-30 Thread HAO CHEN GUI
Hi,
  SImode in float register is supported on P7 above. It causes "fctiw"
can be generated on old 32-bit processors as the output operand of
fctiw insn is a SImode in float/double register. This patch fixes the
problem by adding an expand and an insn pattern for fctiw. The output
of new pattern is SFmode. When the target doesn't support SImode in
float register, it calls the new pattern and convert the SFmode to
SImode via stack.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: enable fctiw on old archs

The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction,
but the instruction can't be generated on such platforms as the insn is
guard by TARGET_POPCNTD.  The root cause is SImode in float register is
supported from Power7.  Actually implementation of "fctiw" only needs
stfiwx which is supported by the old 320-bit processors.  This patch
enables "fctiw" expand for these processors.

gcc/
PR target/112707
* config/rs6000/rs6000.md (UNSPEC_STFIWX_SF, UNSPEC_FCTIW_SF): New.
(expand lrintsi2): New.
(insn lrintsi2): Rename to...
(lrintsi_internal): ...this, and remove guard TARGET_POPCNTD.
(lrintsi_internal2): New.
(stfiwx_sf): New.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d4337ce42a9..1b207522ad5 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -90,6 +90,7 @@ (define_c_enum "unspec"
UNSPEC_TLSTLS_PCREL
UNSPEC_FIX_TRUNC_TF ; fadd, rounding towards zero
UNSPEC_STFIWX
+   UNSPEC_STFIWX_SF
UNSPEC_POPCNTB
UNSPEC_FRES
UNSPEC_SP_SET
@@ -111,6 +112,7 @@ (define_c_enum "unspec"
UNSPEC_PARITY
UNSPEC_CMPB
UNSPEC_FCTIW
+   UNSPEC_FCTIW_SF
UNSPEC_FCTID
UNSPEC_LFIWAX
UNSPEC_LFIWZX
@@ -6722,11 +6724,39 @@ (define_insn "lrintdi2"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

-(define_insn "lrintsi2"
+(define_expand "lrintsi2"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
(unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTIW))]
-  "TARGET_HARD_FLOAT && TARGET_POPCNTD"
+  "TARGET_HARD_FLOAT && TARGET_STFIWX"
+{
+  /* For those old archs in which SImode can't be hold in float registers,
+ call lrintsi_internal2 to put the result in SFmode then
+ convert it via stack.  */
+  if (!TARGET_POPCNTD)
+{
+  rtx tmp = gen_reg_rtx (SFmode);
+  emit_insn (gen_lrintsi_internal2 (tmp, operands[1]));
+  rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
+  emit_insn (gen_stfiwx_sf (stack, tmp));
+  emit_move_insn (operands[0], stack);
+  DONE;
+}
+})
+
+(define_insn "lrintsi_internal"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
+   (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW))]
+  "TARGET_HARD_FLOAT"
+  "fctiw %0,%1"
+  [(set_attr "type" "fp")])
+
+(define_insn "lrintsi_internal2"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=d")
+   (unspec:SF [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW_SF))]
+  "TARGET_HARD_FLOAT"
   "fctiw %0,%1"
   [(set_attr "type" "fp")])

@@ -6801,6 +6831,14 @@ (define_insn "stfiwx"
   [(set_attr "type" "fpstore")
(set_attr "isa" "*,p8v")])

+(define_insn "stfiwx_sf"
+  [(set (match_operand:SI 0 "memory_operand" "=Z")
+   (unspec:SI [(match_operand:SF 1 "gpc_reg_operand" "d")]
+  UNSPEC_STFIWX_SF))]
+  "TARGET_STFIWX"
+  "stfiwx %1,%y0"
+  [(set_attr "type" "fpstore")])
+
 ;; If we don't have a direct conversion to single precision, don't enable this
 ;; conversion for 32-bit without fast math, because we don't have the insn to
 ;; generate the fixup swizzle to avoid double rounding problems.
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
new file mode 100644
index 000..32f708c5402
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-* && be } } } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } }  */
+/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } }  */
+
+int test1 (double a)
+{
+  return __builtin_irint (a);
+}
+
+int test2 (float a)
+{
+  return __builtin_irint (a);
+}



[patch-1v2, rs6000] enable fctiw on old archs [PR112707]

2023-12-06 Thread HAO CHEN GUI
Hi,
  SImode in float register is supported on P7 above. It causes "fctiw"
can't be generated on old 32-bit processors as the output operand of
fctiw insn is an SImode in float/double register. This patch fixes the
problem by adding one expand and one insn pattern for fctiw. The output
of new pattern is DImode. When the targets don't support SImode in
float register, it calls the new insn pattern and convert the DImode
to SImode via stack.

  Compared to last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638860.html
the main change is to change the mode of output operand of the new
insn from SFmode to DImode so that it can call stfiwx pattern directly.
No need additional unspecs.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: enable fctiw on old archs

The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction,
but the instruction can't be generated on such platforms as the insn is
guard by TARGET_POPCNTD.  The root cause is SImode in float register is
supported from Power7.  Actually implementation of "fctiw" only needs
stfiwx which is supported by the old 32-bit processors.  This patch
enables "fctiw" expand for these processors.

gcc/
PR target/112707
* config/rs6000/rs6000.md (expand lrintsi2): New.
(insn lrintsi2): Rename to...
(*lrintsi): ...this.
(lrintsi_di): New.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2a1b5ecfaee..dfb7f19c6ad 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6722,7 +6722,27 @@ (define_insn "lrintdi2"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

-(define_insn "lrintsi2"
+(define_expand "lrintsi2"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
+   (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW))]
+  "TARGET_HARD_FLOAT && TARGET_STFIWX"
+{
+  /* For those old archs in which SImode can't be hold in float registers,
+ call lrintsi_internal2 to put the result in SFmode then
+ convert it via stack.  */
+  if (!TARGET_POPCNTD)
+{
+  rtx tmp = gen_reg_rtx (DImode);
+  emit_insn (gen_lrintsi_di (tmp, operands[1]));
+  rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
+  emit_insn (gen_stfiwx (stack, tmp));
+  emit_move_insn (operands[0], stack);
+  DONE;
+}
+})
+
+(define_insn "*lrintsi"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
(unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTIW))]
@@ -6730,6 +6750,14 @@ (define_insn "lrintsi2"
   "fctiw %0,%1"
   [(set_attr "type" "fp")])

+(define_insn "lrintsi_di"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
+   (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW))]
+  "TARGET_HARD_FLOAT && !TARGET_POPCNTD"
+  "fctiw %0,%1"
+  [(set_attr "type" "fp")])
+
 (define_insn "btrunc2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
new file mode 100644
index 000..cce6bd7f690
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-skip-if "" { has_arch_ppc64 } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } }  */
+/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } }  */
+
+int test1 (double a)
+{
+  return __builtin_irint (a);
+}
+
+int test2 (float a)
+{
+  return __builtin_irint (a);
+}


[patch-2v2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-12-06 Thread HAO CHEN GUI
Hi,
  The "fctid" is supported on 64-bit Power processors and powerpc 476. It
need a guard to check it. The patch fixes the issue.

  Compared with last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638859.html
the main change is to define TARGET_FCTID to POWERPC64 or PPC476. Also
guard "lrintdi2" by TARGET_FCTID as it generates fctid.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: guard fctid on PPC64 and powerpc 476.

fctid is supported on 64-bit Power processors and powerpc 476. It should
be guarded by this condition. The patch fixes the issue.

gcc/
PR target/112707
* config/rs6000/rs6000.h (TARGET_FCTID): Define.
* config/rs6000/rs6000.md (lrintdi2): Add guard TARGET_FCTID.
* (lrounddi2): Replace TARGET_FPRND with TARGET_FCTID.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707.h: New.
* gcc.target/powerpc/pr112707-2.c: New.
* gcc.target/powerpc/pr112707-3.c: New.
* gcc.target/powerpc/pr88558-p7.c: Remove fctid for ilp32 as it's
now guarded by powerpc64.
* gcc.target/powerpc/pr88558-p8.c: Likewise.
* gfortran.dg/nint_p7.f90: Add powerpc64 target requirement as
lrounddi2 is now guarded by powerpc64.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 22595f6ebd7..8c29ca68ccf 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -467,6 +467,8 @@ extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
 #define TARGET_FCTIWUZ TARGET_POPCNTD
+/* Enable fctid on ppc64 and powerpc476.  */
+#define TARGET_FCTID   (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476)
 #define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 #define TARGET_MADDLD  TARGET_MODULO
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2a1b5ecfaee..3be79d49dc0 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6718,7 +6718,7 @@ (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
(unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_FCTID"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

@@ -6784,7 +6784,7 @@ (define_expand "lrounddi2"
(set (match_operand:DI 0 "gpc_reg_operand")
(unspec:DI [(match_dup 2)]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FCTID"
 {
   operands[2] = gen_reg_rtx (mode);
 })
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
new file mode 100644
index 000..672e00691ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-skip-if "" { has_arch_ppc64 } } */
+/* { dg-final { scan-assembler-not {\mfctid\M} } }  */
+
+/* powerpc 7450 doesn't support ppc64 (-m32 -mpowerpc64), so skips it.  */
+
+#include "pr112707.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
new file mode 100644
index 000..924338fd390
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=476fp" } */
+/* { dg-require-effective-target ilp32 } */
+
+/* powerpc 476fp has hard float enabled which is required by fctid */
+
+#include "pr112707.h"
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707.h 
b/gcc/testsuite/gcc.target/powerpc/pr112707.h
new file mode 100644
index 000..e427dc6a72e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707.h
@@ -0,0 +1,10 @@
+long long test1 (double a)
+{
+  return __builtin_llrint (a);
+}
+
+long long test2 (float a)
+{
+  return __builtin_llrint (a);
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c 
b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
index 3932656c5fd..13d433c4bdb 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
@@ -6,7 +6,6 @@
 #include "pr88558.h"

 /* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mstfiwx\M} 2 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c 
b/gcc/t

Re: [patch-2v3, rs6000] Guard fctid on PowerPC64 and PowerPC476 [PR112707]

2023-12-07 Thread HAO CHEN GUI
Hi,
  The "fctid" is supported on 64-bit Power processors and PowerPC476. It
need a guard to check it. The patch fixes the issue.

  Compared with last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639536.html
the main change is to change the target requirement in pr88558*.c.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?


ChangeLog
rs6000: Guard fctid on PowerPC64 and PowerPC476.

fctid is supported on 64-bit Power processors and powerpc 476. It should
be guarded by this condition. The patch fixes the issue.

gcc/
PR target/112707
* config/rs6000/rs6000.h (TARGET_FCTID): Define.
* config/rs6000/rs6000.md (lrintdi2): Add guard TARGET_FCTID.
* (lrounddi2): Replace TARGET_FPRND with TARGET_FCTID.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707.h: New.
* gcc.target/powerpc/pr112707-2.c: New.
* gcc.target/powerpc/pr112707-3.c: New.
* gcc.target/powerpc/pr88558-p7.c: Check fctid on ilp32 and
has_arch_ppc64 as it's now guarded by powerpc64.
* gcc.target/powerpc/pr88558-p8.c: Likewise.
* gfortran.dg/nint_p7.f90: Add powerpc64 target requirement as
lrounddi2 is now guarded by powerpc64.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 22595f6ebd7..8c29ca68ccf 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -467,6 +467,8 @@ extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
 #define TARGET_FCTIWUZ TARGET_POPCNTD
+/* Enable fctid on ppc64 and powerpc476.  */
+#define TARGET_FCTID   (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476)
 #define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 #define TARGET_MADDLD  TARGET_MODULO
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2a1b5ecfaee..3be79d49dc0 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6718,7 +6718,7 @@ (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
(unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_FCTID"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

@@ -6784,7 +6784,7 @@ (define_expand "lrounddi2"
(set (match_operand:DI 0 "gpc_reg_operand")
(unspec:DI [(match_dup 2)]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FCTID"
 {
   operands[2] = gen_reg_rtx (mode);
 })
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
new file mode 100644
index 000..672e00691ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-skip-if "" { has_arch_ppc64 } } */
+/* { dg-final { scan-assembler-not {\mfctid\M} } }  */
+
+/* powerpc 7450 doesn't support ppc64 (-m32 -mpowerpc64), so skips it.  */
+
+#include "pr112707.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
new file mode 100644
index 000..924338fd390
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=476fp" } */
+/* { dg-require-effective-target ilp32 } */
+
+/* powerpc 476fp has hard float enabled which is required by fctid */
+
+#include "pr112707.h"
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707.h 
b/gcc/testsuite/gcc.target/powerpc/pr112707.h
new file mode 100644
index 000..e427dc6a72e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707.h
@@ -0,0 +1,10 @@
+long long test1 (double a)
+{
+  return __builtin_llrint (a);
+}
+
+long long test2 (float a)
+{
+  return __builtin_llrint (a);
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c 
b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
index 3932656c5fd..2fa0b997e52 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
@@ -6,7 +6,7 @@
 #include "pr88558.h"

 /* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target { ilp32 && 
has_arch_ppc64 } } } } */
 /* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mstfiwx\M} 2 { target lp64 } } } */
diff --git a/gcc/testsuite/g

[Patch, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-10 Thread HAO CHEN GUI
Hi,
  The patch corrects the definition of
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a
comprehensible name.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Correct definition of macro of fixed point efficient unaligned

Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
guard whether a platform is efficient on fixed point unaligned load/store.
It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
from P8 and can be disabled by mno-vsx option. So the definition is wrong.
This patch corrects the problem and define it by "!STRICT_ALIGNMENT" which
is true on P7 BE and P8 above.

gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Rename to...
(TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT): ...this, set it to
!STRICT_ALIGNMENT.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT.
(select_block_compare_mode): Likewise.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.

gcc/testsuite/
* gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c: New.
* gcc.target/powerpc/target_efficient_unaligned_fixedpoint-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 44a946cd453..d4030854b2a 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   else if (bytes == GET_MODE_SIZE (QImode))
 return QImode;
   else if (bytes < GET_MODE_SIZE (SImode)
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && offset >= GET_MODE_SIZE (SImode) - bytes)
 /* This matches the case were we have SImode and 3 bytes
and offset >= 1 and permits us to move back one and overlap
@@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
unwanted bytes off of the input.  */
 return SImode;
   else if (word_mode_ok && bytes < UNITS_PER_WORD
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && offset >= UNITS_PER_WORD-bytes)
 /* Similarly, if we can use DImode it will get matched here and
can do an overlapping read that ends at the end of the block.  */
@@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
@@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[])
   /* The code generated for p7 and older is not faster than glibc
  memcmp if alignment is small and length is not short, so bail
  out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && ((base_align == 1 && bytes > 16)
  || (base_align == 2 && bytes > 32)))
 return false;
@@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes_to_compare >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 326c45221e9..2f3a82942c1 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -483,9 +483,9 @@ extern int rs6000_vector_align[];
 #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT
 #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT)

-/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
-   loads are slow. */
-#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
+/* Like TARGET_EFFICIENT_UNALIGNED_VSX, indicates if unaligned fixed point
+   loads/stores are efficient.  */
+#define TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT (!STRICT_ALIGNMENT)

 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features.  TImode syncs need quad
diff --git 
a/gcc/testsuite/gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c 
b/gcc/testsuite/gcc.target/powerpc/targ

[Patch, rs6000] Clean up pre-checking of expand_block_compare

2023-12-10 Thread HAO CHEN GUI
Hi,
  This patch cleans up pre-checking of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Return false when optimizing for size.
3. Remove P7 CPU test as only P7 above can enter this function and P7
LE is excluded by targetm.slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up pre-checking of expand_block_compare

gcc/
* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function.  Return false when it's
optimized for size.  Remove P7 CPU test as only P7 above can enter
this function and P7 LE is excluded by the checking of
targetm.slow_unaligned_access on word_mode.

gcc/testsuite/
* gcc.target/powerpc/memcmp_for_size.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index d4030854b2a..dff69e90d0c 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1946,6 +1946,15 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  gcc_assert (TARGET_POPCNTD);
+
+  if (optimize_insn_for_size_p ())
+return false;
+
+  /* Allow this param to shut off all expansion.  */
+  if (rs6000_block_compare_inline_limit == 0)
+return false;
+
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
@@ -1959,23 +1968,9 @@ expand_block_compare (rtx operands[])
   if (TARGET_32BIT && TARGET_POWERPC64)
 return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
-  /* Allow this param to shut off all expansion.  */
-  if (rs6000_block_compare_inline_limit == 0)
-return false;
-
-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
- However slow_unaligned_access returns true on P7 even though the
- performance of this code is good there.  */
-  if (!isP7
-  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
- || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
-return false;
-
-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
- not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+   || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
 return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2023,14 +2018,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
 return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
- memcmp if alignment is small and length is not short, so bail
- out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
-  && ((base_align == 1 && bytes > 16)
- || (base_align == 2 && bytes > 32)))
-return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c 
b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
new file mode 100644
index 000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}


Re: [PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-06 Thread HAO CHEN GUI
Hi Kewen,

   Thanks for your review comments. Just one question on following
comment.

在 2023/11/7 10:40, Kewen.Lin 写道:
> Nit: has_arch_pwr8 would make it un-tested on Power7 default env, I'd prefer 
> to remove this "has_arch_pwr8" and append "-mdejagnu-cpu=power8" to 
> dg-options.

My original propose is to test the case on p8/p9/p10. Each of them
generate different instruction sequence. If it's assigned
"-mdejagnu-cpu=power8", only p8 instruction sequence is generated.
Does it lost the coverage?

Thanks
Gui Haochen


[PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-08 Thread HAO CHEN GUI
Hi,
  This patch enables vector mode for by pieces equality compare. It
adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES
and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare
relies both move and compare instructions, so both macro are changed.
As the vector load/store might be unaligned, the 16-byte move and
compare are only enabled when VSX and EFFICIENT_UNALIGNED_VSX are both
enabled.

  This patch enables 16-byte by pieces move. As the vector mode is not
enabled for by pieces move, TImode is used for the move. It caused 2
regression cases. The root cause is that now 16-byte length array can
be constructed by one load instruction and not be put into LC0 so that
SRA optimization will not be taken.

  Compared to previous version, the main change is to modify the guard
of expand pattern and compiling options of the test case. Also the fix
for two regression cases caused by 16-byte move enablement is moved to
this patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable vector mode for by pieces equality compare

This patch adds a new expand pattern - cbranchv16qi4 to enable vector
mode by pieces equality compare on rs6000.  The macro MOVE_MAX_PIECES
(COMPARE_MAX_PIECES) is set to 16 bytes when VSX and
EFFICIENT_UNALIGNED_VSX is enabled, otherwise keeps unchanged.  The
macro STORE_MAX_PIECES is set to the same value as MOVE_MAX_PIECES by
default, so now it's explicitly defined and keeps unchanged.

gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
insn sequence for V16QImode equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(STORE_MAX_PIECES): Define.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-1.c: New.
* gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit powerpc.
* gcc.dg/tree-ssa/sra-18.c: Likewise.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..a1423c76451 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,48 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+/* The cbranch_optabs doesn't allow FAIL, so old cpus which are
+   inefficient on unaligned vsx are disabled as the cost is high
+   for unaligned load/store.  */
+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+   [(match_operand:V16QI 1 "reg_or_mem_operand")
+(match_operand:V16QI 2 "reg_or_mem_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_MEM_VSX_P (V16QImode)
+   && TARGET_EFFICIENT_UNALIGNED_VSX"
+{
+  /* Use direct move for P8 LE to skip double-word swap, as the byte
+ order doesn't matter for equality compare.  If any operands are
+ altivec indexed or indirect operands, the load can be implemented
+ directly by altivec aligned load instruction and swap is no
+ need.  */
+  if (!TARGET_P9_VECTOR
+  && !BYTES_BIG_ENDIAN
+  && MEM_P (operands[1])
+  && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+  && MEM_P (operands[2])
+  && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+{
+  rtx reg_op1 = gen_reg_rtx (V16QImode);
+  rtx reg_op2 = gen_reg_rtx (V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+  operands[1] = reg_op1;
+  operands[2] = reg_op2;
+}
+  else
+{
+  operands[1] = force_reg (V16QImode, operands[1]);
+  operands[2] = force_reg (V16QImode, operands[2]);
+}
+
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cc24dd5301e..10279052636 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
  else
emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
}
+  else if (mode == V16QImode)
+   {
+ gcc_assert (code == EQ || code == NE);
+
+ rtx result_vector = gen_reg_rtx (V16QImode);
+ rtx cc_bit = gen_reg_rtx (SImode);
+ emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+ emit_insn (gen_cr6_test_for_lt (cc_bit));
+ emit_insn (gen_rtx_SET (compare_result,
+ gen_rtx_COMPARE (comp_mode, cc_bit,
+  

[PATCH-3v3, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-08 Thread HAO CHEN GUI
Hi,
  Originally 16-byte memory to memory is expanded via pattern.
expand_block_move does an optimization on P8 LE to leverage V2DI reversed
load/store for memory to memory move. Now it's done by 16-byte by pieces
move and the optimization is lost. This patch adds an insn_and_split
pattern to retake the optimization.

  Compared to the previous version, the main change is to move fix for
two regression cases to former patch and change the condition of pattern.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Fix regression cases caused 16-byte by pieces move

The previous patch enables 16-byte by pieces move. Originally 16-byte
move is implemented via pattern.  expand_block_move does an optimization
on P8 LE to leverage V2DI reversed load/store for memory to memory move.
Now 16-byte move is implemented via by pieces move and finally split to
two DImode load/store.  This patch creates an insn_and_split pattern to
retake the optimization.

gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f3b40229094..3f71e96dc6b 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -414,6 +414,29 @@ (define_mode_attr VM3_char [(V2DI "d")

 ;; VSX moves

+;; TImode memory to memory move optimization on LE with p8vector
+(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti"
+  [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z")
+   (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN
+   && TARGET_VSX
+   && !TARGET_P9_VECTOR
+   && !MEM_VOLATILE_P (operands[0])
+   && !MEM_VOLATILE_P (operands[1])
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx src =  adjust_address (operands[1], V2DImode, 0);
+  emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src));
+  rtx dest = adjust_address (operands[0], V2DImode, 0);
+  emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")])
+
 ;; The patterns for LE permuted loads and stores come before the general
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
new file mode 100644
index 000..7003bdc0208
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { has_arch_pwr8 } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+/* Ensure 16-byte by pieces move is enabled.  */
+
+void move1 (void *s1, void *s2)
+{
+  __builtin_memcpy (s1, s2, 16);
+}
+
+void move2 (void *s1)
+{
+  __builtin_memcpy (s1, "0123456789012345", 16);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */



[PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-08 Thread HAO CHEN GUI
Hi,
  This patch modifies expand_builtin_return and make it call
expand_misaligned_mem_ref to load unaligned memory.  The memory reference
pointed by void* pointer might be unaligned, so expanding it with
unaligned move optabs is safe.

  The new test case illustrates the problem. rs6000 doesn't have
unaligned vector load instruction with VSX disabled. When calling
builtin_return, it shouldn't load the memory to vector register by
unaligned load instruction directly. It should store it to an on stack
variable by extract_bit_field then load to return register from stack
by aligned load instruction.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
expand: Call misaligned memory reference in expand_builtin_return

expand_builtin_return loads memory to return registers.  The memory might
be unaligned compared to the mode of the registers.  So it should be
expanded by unaligned move optabs if the memory reference is unaligned.

gcc/
PR target/112417
* builtins.cc (expand_builtin_return): Call
expand_misaligned_mem_ref for loading unaligned memory reference.
* builtins.h (expand_misaligned_mem_ref): Declare.
* expr.cc (expand_misaligned_mem_ref): No longer static.

gcc/testsuite/
PR target/112417
* gcc.target/powerpc/pr112417.c: New.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index cb90bd03b3e..b879eb88b7c 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -1816,7 +1816,12 @@ expand_builtin_return (rtx result)
if (size % align != 0)
  size = CEIL (size, align) * align;
reg = gen_rtx_REG (mode, INCOMING_REGNO (regno));
-   emit_move_insn (reg, adjust_address (result, mode, size));
+   rtx tmp = adjust_address (result, mode, size);
+   unsigned int align = MEM_ALIGN (tmp);
+   if (align < GET_MODE_ALIGNMENT (mode))
+ tmp = expand_misaligned_mem_ref (tmp, mode, 1, align,
+  NULL, NULL);
+   emit_move_insn (reg, tmp);

push_to_sequence (call_fusage);
emit_use (reg);
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 88a26d70cd5..a3d7954ee6e 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -157,5 +157,7 @@ extern internal_fn replacement_internal_fn (gcall *);

 extern bool builtin_with_linkage_p (tree);
 extern int type_to_class (tree);
+extern rtx expand_misaligned_mem_ref (rtx, machine_mode, int, unsigned int,
+ rtx, rtx *);

 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/expr.cc b/gcc/expr.cc
index ed4dbb13d89..b0adb35a095 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9156,7 +9156,7 @@ expand_cond_expr_using_cmove (tree treeop0 
ATTRIBUTE_UNUSED,
If the result can be stored at TARGET, and ALT_RTL is non-NULL,
then *ALT_RTL is set to TARGET (before legitimziation).  */

-static rtx
+rtx
 expand_misaligned_mem_ref (rtx temp, machine_mode mode, int unsignedp,
   unsigned int align, rtx target, rtx *alt_rtl)
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112417.c 
b/gcc/testsuite/gcc.target/powerpc/pr112417.c
new file mode 100644
index 000..ef82fc82033
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112417.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { has_arch_pwr7 } } } */
+/* { dg-options "-mno-vsx -maltivec -O2" } */
+
+void * foo (void * p)
+{
+  if (p)
+__builtin_return (p);
+}
+
+/* Ensure that unaligned load is generated via stack load/store.  */
+/* { dg-final { scan-assembler {\mstw\M} { target { ! has_arch_ppc64 } } } } */
+/* { dg-final { scan-assembler {\mstd\M} { target has_arch_ppc64 } } } */


Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-09 Thread HAO CHEN GUI
Hi Richard,
  Thanks so much for your comments.

在 2023/11/9 19:41, Richard Biener 写道:
> I'm not sure if the testcase is valid though?
> 
> @defbuiltin{{void} __builtin_return (void *@var{result})}
> This built-in function returns the value described by @var{result} from
> the containing function.  You should specify, for @var{result}, a value
> returned by @code{__builtin_apply}.
> @enddefbuiltin
> 
> I don't see __builtin_apply being used here?

The prototype of the test case is from "__objc_block_forward" in
libobjc/sendmsg.c.

  void *args, *res;

  args = __builtin_apply_args ();
  res = __objc_forward (rcv, op, args);
  if (res)
__builtin_return (res);
  else
...

The __builtin_apply_args puts the return values on stack by the alignment.
But the forward function can do anything and return a void* pointer.
IMHO the alignment might be broken. So I just simplified it to use a
void* pointer as the input argument of  "__builtin_return" and skip
"__builtin_apply_args".

Thanks
Gui Haochen


[PATCH-3v4, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-10 Thread HAO CHEN GUI
Hi,
  Originally 16-byte memory to memory is expanded via pattern.
expand_block_move does an optimization on P8 LE to leverage V2DI reversed
load/store for memory to memory move. Now it's done by 16-byte by pieces
move and the optimization is lost. This patch adds an insn_and_split
pattern to retake the optimization.

  Compared to the previous version, the main change is to remove volatile
memory operands check from the insn condition as it's no need.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Fix regression cases caused 16-byte by pieces move

The previous patch enables 16-byte by pieces move. Originally 16-byte
move is implemented via pattern.  expand_block_move does an optimization
on P8 LE to leverage V2DI reversed load/store for memory to memory move.
Now 16-byte move is implemented via by pieces move and finally split to
two DI load/store.  This patch creates an insn_and_split pattern to
retake the optimization.

gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f3b40229094..26fa32829af 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d")

 ;; VSX moves

+;; TImode memory to memory move optimization on LE with p8vector
+(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti"
+  [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z")
+   (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN
+   && TARGET_VSX
+   && !TARGET_P9_VECTOR
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx src =  adjust_address (operands[1], V2DImode, 0);
+  emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src));
+  rtx dest = adjust_address (operands[0], V2DImode, 0);
+  emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")])
+
 ;; The patterns for LE permuted loads and stores come before the general
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
new file mode 100644
index 000..7003bdc0208
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { has_arch_pwr8 } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+/* Ensure 16-byte by pieces move is enabled.  */
+
+void move1 (void *s1, void *s2)
+{
+  __builtin_memcpy (s1, s2, 16);
+}
+
+void move2 (void *s1)
+{
+  __builtin_memcpy (s1, "0123456789012345", 16);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */


Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-10 Thread HAO CHEN GUI
Hi Richard,

在 2023/11/10 17:06, Richard Biener 写道:
> On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI  wrote:
>>
>> Hi Richard,
>>   Thanks so much for your comments.
>>
>> 在 2023/11/9 19:41, Richard Biener 写道:
>>> I'm not sure if the testcase is valid though?
>>>
>>> @defbuiltin{{void} __builtin_return (void *@var{result})}
>>> This built-in function returns the value described by @var{result} from
>>> the containing function.  You should specify, for @var{result}, a value
>>> returned by @code{__builtin_apply}.
>>> @enddefbuiltin
>>>
>>> I don't see __builtin_apply being used here?
>>
>> The prototype of the test case is from "__objc_block_forward" in
>> libobjc/sendmsg.c.
>>
>>   void *args, *res;
>>
>>   args = __builtin_apply_args ();
>>   res = __objc_forward (rcv, op, args);
>>   if (res)
>> __builtin_return (res);
>>   else
>> ...
>>
>> The __builtin_apply_args puts the return values on stack by the alignment.
>> But the forward function can do anything and return a void* pointer.
>> IMHO the alignment might be broken. So I just simplified it to use a
>> void* pointer as the input argument of  "__builtin_return" and skip
>> "__builtin_apply_args".
> 
> But doesn't __objc_forward then break the contract between
> __builtin_apply_args and __builtin_return?
> 
> That said, __builtin_return is a very special function, it's not supposed
> to deal with what you are fixing.  At least I think so.
> 
> IMHO the bug is in __objc_block_forward.

If so, can we document that the memory objects pointed by input argument of
__builtin_return have to be aligned? Then we can force the alignment in
__builtin_return. The customer function can do anything if gcc doesn't state
that.

Thanks
Gui Haochen

> 
> Richard.
> 
>>
>> Thanks
>> Gui Haochen


Re: Fwd: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-13 Thread HAO CHEN GUI
Sorry, forgot to cc gcc-patches.

在 2023/11/13 16:05, HAO CHEN GUI 写道:
> Andrew,
>   Could you kindly inform us what's the functionality of __objc_forward?
> Does it change the memory content pointed by args? Thanks a lot.
> 
> Thanks
> Gui Haochen
> 
> 
> libobjc/sendmsg.c.
> 
>void *args, *res;
> 
>args = __builtin_apply_args ();
>res = __objc_forward (rcv, op, args);
>if (res)
>  __builtin_return (res);
>else
>  ...
> 
>  转发的消息 
> 主题: Re: [PATCH, expand] Call misaligned memory reference in 
> expand_builtin_return [PR112417]
> 日期: Fri, 10 Nov 2023 14:39:02 +0100
> From: Richard Biener 
> 收件人: HAO CHEN GUI 
> 抄送: gcc-patches , Kewen.Lin 
> 
> On Fri, Nov 10, 2023 at 11:10 AM HAO CHEN GUI  wrote:
>>
>> Hi Richard,
>>
>> 在 2023/11/10 17:06, Richard Biener 写道:
>>> On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI  wrote:
>>>>
>>>> Hi Richard,
>>>>   Thanks so much for your comments.
>>>>
>>>> 在 2023/11/9 19:41, Richard Biener 写道:
>>>>> I'm not sure if the testcase is valid though?
>>>>>
>>>>> @defbuiltin{{void} __builtin_return (void *@var{result})}
>>>>> This built-in function returns the value described by @var{result} from
>>>>> the containing function.  You should specify, for @var{result}, a value
>>>>> returned by @code{__builtin_apply}.
>>>>> @enddefbuiltin
>>>>>
>>>>> I don't see __builtin_apply being used here?
>>>>
>>>> The prototype of the test case is from "__objc_block_forward" in
>>>> libobjc/sendmsg.c.
>>>>
>>>>   void *args, *res;
>>>>
>>>>   args = __builtin_apply_args ();
>>>>   res = __objc_forward (rcv, op, args);
>>>>   if (res)
>>>> __builtin_return (res);
>>>>   else
>>>> ...
>>>>
>>>> The __builtin_apply_args puts the return values on stack by the alignment.
>>>> But the forward function can do anything and return a void* pointer.
>>>> IMHO the alignment might be broken. So I just simplified it to use a
>>>> void* pointer as the input argument of  "__builtin_return" and skip
>>>> "__builtin_apply_args".
>>>
>>> But doesn't __objc_forward then break the contract between
>>> __builtin_apply_args and __builtin_return?
>>>
>>> That said, __builtin_return is a very special function, it's not supposed
>>> to deal with what you are fixing.  At least I think so.
>>>
>>> IMHO the bug is in __objc_block_forward.
>>
>> If so, can we document that the memory objects pointed by input argument of
>> __builtin_return have to be aligned? Then we can force the alignment in
>> __builtin_return. The customer function can do anything if gcc doesn't state
>> that.
> 
> I don't think they have to be aligned - they have to adhere to the ABI
> which __builtin_apply_args ensures.  But others might know more details
> here.
> 
>> Thanks
>> Gui Haochen
>>
>>>
>>> Richard.
>>>
>>>>
>>>> Thanks
>>>> Gui Haochen


[PATCH] Clean up

2023-11-14 Thread HAO CHEN GUI
Hi,
  This patch cleans up by_pieces_ninsns and does following things.
1. Do the length and alignment adjustment for by pieces compare when
overlap operation is enabled.
2. Remove unnecessary mov_optab checks.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
Clean up by_pieces_ninsns

The by pieces compare can be implemented by overlapped operations. So
it should be taken into consideration when doing the adjustment for
overlap operations.  The mode returned from
widest_fixed_size_mode_for_size is already checked with mov_optab in
by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
So there is no need to check mov_optab again in by_pieces_ninsns.
The patch fixes these issues.

gcc/
* expr.cc (by_pieces_ninsns): Include by pieces compare when
do the adjustment for overlap operations.  Remove unnecessary
mov_optab check.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 3e2a678710d..7cb2c935177 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1090,18 +1090,15 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
int align,
   unsigned HOST_WIDE_INT n_insns = 0;
   fixed_size_mode mode;

-  if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES)
+  if (targetm.overlap_op_by_pieces_p ())
 {
   /* NB: Round up L and ALIGN to the widest integer mode for
 MAX_SIZE.  */
   mode = widest_fixed_size_mode_for_size (max_size, op);
-  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
-   {
- unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
- if (up > l)
-   l = up;
- align = GET_MODE_ALIGNMENT (mode);
-   }
+  unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
+   if (up > l)
+ l = up;
+  align = GET_MODE_ALIGNMENT (mode);
 }

   align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
@@ -1109,12 +1106,10 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
int align,
   while (max_size > 1 && l > 0)
 {
   mode = widest_fixed_size_mode_for_size (max_size, op);
-  enum insn_code icode;

   unsigned int modesize = GET_MODE_SIZE (mode);

-  icode = optab_handler (mov_optab, mode);
-  if (icode != CODE_FOR_nothing && align >= GET_MODE_ALIGNMENT (mode))
+  if (align >= GET_MODE_ALIGNMENT (mode))
{
  unsigned HOST_WIDE_INT n_pieces = l / modesize;
  l %= modesize;


[PATCH] Clean up by_pieces_ninsns

2023-11-14 Thread HAO CHEN GUI
Hi,
  This patch cleans up by_pieces_ninsns and does following things.
1. Do the length and alignment adjustment for by pieces compare when
overlap operation is enabled.
2. Remove unnecessary mov_optab checks.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
Clean up by_pieces_ninsns

The by pieces compare can be implemented by overlapped operations. So
it should be taken into consideration when doing the adjustment for
overlap operations.  The mode returned from
widest_fixed_size_mode_for_size is already checked with mov_optab in
by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
So there is no need to check mov_optab again in by_pieces_ninsns.
The patch fixes these issues.

gcc/
* expr.cc (by_pieces_ninsns): Include by pieces compare when
do the adjustment for overlap operations.  Remove unnecessary
mov_optab check.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 3e2a678710d..7cb2c935177 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1090,18 +1090,15 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
int align,
   unsigned HOST_WIDE_INT n_insns = 0;
   fixed_size_mode mode;

-  if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES)
+  if (targetm.overlap_op_by_pieces_p ())
 {
   /* NB: Round up L and ALIGN to the widest integer mode for
 MAX_SIZE.  */
   mode = widest_fixed_size_mode_for_size (max_size, op);
-  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
-   {
- unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
- if (up > l)
-   l = up;
- align = GET_MODE_ALIGNMENT (mode);
-   }
+  unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
+   if (up > l)
+ l = up;
+  align = GET_MODE_ALIGNMENT (mode);
 }

   align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
@@ -1109,12 +1106,10 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
int align,
   while (max_size > 1 && l > 0)
 {
   mode = widest_fixed_size_mode_for_size (max_size, op);
-  enum insn_code icode;

   unsigned int modesize = GET_MODE_SIZE (mode);

-  icode = optab_handler (mov_optab, mode);
-  if (icode != CODE_FOR_nothing && align >= GET_MODE_ALIGNMENT (mode))
+  if (align >= GET_MODE_ALIGNMENT (mode))
{
  unsigned HOST_WIDE_INT n_pieces = l / modesize;
  l %= modesize;



[PATCHv2] Clean up by_pieces_ninsns

2023-11-22 Thread HAO CHEN GUI
Hi,
  This patch cleans up by_pieces_ninsns and does following things.
1. Do the length and alignment adjustment for by pieces compare when
overlap operation is enabled.
2. Replace unnecessary mov_optab checks with gcc assertions.

  Compared to last version, the main change is to replace unnecessary
mov_optab checks with gcc assertions and fix the indentation.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
Clean up by_pieces_ninsns

The by pieces compare can be implemented by overlapped operations. So
it should be taken into consideration when doing the adjustment for
overlap operations.  The mode returned from
widest_fixed_size_mode_for_size is already checked with mov_optab in
by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
So it is no need to check mov_optab again in by_pieces_ninsns.  The
patch fixes these issues.

gcc/
* expr.cc (by_pieces_ninsns): Include by pieces compare when
do the adjustment for overlap operations.  Replace mov_optab
checks with gcc assertions.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 556bcf7ef59..ffd18fe43cc 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1090,18 +1090,16 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
int align,
   unsigned HOST_WIDE_INT n_insns = 0;
   fixed_size_mode mode;

-  if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES)
+  if (targetm.overlap_op_by_pieces_p ())
 {
   /* NB: Round up L and ALIGN to the widest integer mode for
 MAX_SIZE.  */
   mode = widest_fixed_size_mode_for_size (max_size, op);
-  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
-   {
- unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
- if (up > l)
-   l = up;
- align = GET_MODE_ALIGNMENT (mode);
-   }
+  gcc_assert (optab_handler (mov_optab, mode) != CODE_FOR_nothing);
+  unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
+  if (up > l)
+   l = up;
+  align = GET_MODE_ALIGNMENT (mode);
 }

   align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
@@ -1109,12 +1107,11 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
int align,
   while (max_size > 1 && l > 0)
 {
   mode = widest_fixed_size_mode_for_size (max_size, op);
-  enum insn_code icode;
+  gcc_assert (optab_handler (mov_optab, mode) != CODE_FOR_nothing);

   unsigned int modesize = GET_MODE_SIZE (mode);

-  icode = optab_handler (mov_optab, mode);
-  if (icode != CODE_FOR_nothing && align >= GET_MODE_ALIGNMENT (mode))
+  if (align >= GET_MODE_ALIGNMENT (mode))
{
  unsigned HOST_WIDE_INT n_pieces = l / modesize;
  l %= modesize;


[PATCH] Expand: Pass down equality only flag to cmpmem expand

2023-11-27 Thread HAO CHEN GUI
Hi,
  This patch passes down the equality only flags from
emit_block_cmp_hints to cmpmem optab so that the target specific expand
can generate optimized insns for equality only compare. Targets
(e.g. rs6000) can generate more efficient insn sequence if the block
compare is equality only.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
Expand: Pass down equality only flag to cmpmem expand

Targets (e.g. rs6000) can generate more efficient insn sequence if the
block compare is equality only.  This patch passes down the equality
only flags from emit_block_cmp_hints to cmpmem optab so that the target
specific expand can generate optimized insns for equality only compare.

gcc/
* expr.cc (expand_cmpstrn_or_cmpmem): Rename to...
(expand_cmpstrn): ...this.
(expand_cmpmem): New function.  Pass down equality only flag to
cmpmem expand.
(emit_block_cmp_via_cmpmem): Add an argument for equality only
flag and call expand_cmpmem instead of expand_cmpstrn_or_cmpmem.
(emit_block_cmp_hints): Call emit_block_cmp_via_cmpmem with
equality only flag.
* expr.h (expand_cmpstrn, expand_cmpmem): Declare.
* builtins.cc (expand_builtin_strcmp, expand_builtin_strncmp):
Call expand_cmpstrn instead of expand_cmpstrn_or_cmpmem.
* config/i386/i386.md (cmpmemsi): Add the sixth operand for
equality only flag.
* config/rs6000/rs6000.md (cmpmemsi): Likewise.
* config/s390/s390.md (cmpmemsi): Likewise.
* doc/md.texi (cmpmem): Modify the document and add an operand
for equality only flag.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 5ece0d23eb9..c2dbc25433d 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -4819,7 +4819,7 @@ expand_builtin_strcmp (tree exp, ATTRIBUTE_UNUSED rtx 
target)
   if (len && !TREE_SIDE_EFFECTS (len))
{
  arg3_rtx = expand_normal (len);
- result = expand_cmpstrn_or_cmpmem
+ result = expand_cmpstrn
(cmpstrn_icode, target, arg1_rtx, arg2_rtx, TREE_TYPE (len),
 arg3_rtx, MIN (arg1_align, arg2_align));
}
@@ -4929,9 +4929,9 @@ expand_builtin_strncmp (tree exp, ATTRIBUTE_UNUSED rtx 
target,
   rtx arg1_rtx = get_memory_rtx (arg1, len);
   rtx arg2_rtx = get_memory_rtx (arg2, len);
   rtx arg3_rtx = expand_normal (len);
-  result = expand_cmpstrn_or_cmpmem (cmpstrn_icode, target, arg1_rtx,
-arg2_rtx, TREE_TYPE (len), arg3_rtx,
-MIN (arg1_align, arg2_align));
+  result = expand_cmpstrn (cmpstrn_icode, target, arg1_rtx, arg2_rtx,
+  TREE_TYPE (len), arg3_rtx,
+  MIN (arg1_align, arg2_align));

   tree fndecl = get_callee_fndecl (exp);
   if (result)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1b5a794b9e5..775cba5d93d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -23195,7 +23195,8 @@ (define_expand "cmpmemsi"
 (compare:SI (match_operand:BLK 1 "memory_operand" "")
 (match_operand:BLK 2 "memory_operand" "") ) )
(use (match_operand 3 "general_operand"))
-   (use (match_operand 4 "immediate_operand"))]
+   (use (match_operand 4 "immediate_operand"))
+   (use (match_operand 5 ""))]
   ""
 {
   if (ix86_expand_cmpstrn_or_cmpmem (operands[0], operands[1],
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2a1b5ecfaee..e66330f876e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10097,7 +10097,8 @@ (define_expand "cmpmemsi"
(compare:SI (match_operand:BLK 1)
(match_operand:BLK 2)))
  (use (match_operand:SI 3))
- (use (match_operand:SI 4))])]
+ (use (match_operand:SI 4))
+ (use (match_operand:SI 5))])]
   "TARGET_POPCNTD"
 {
   if (expand_block_compare (operands))
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 4bdb679daf2..506e79fb035 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3790,7 +3790,8 @@ (define_expand "cmpmemsi"
 (compare:SI (match_operand:BLK 1 "memory_operand" "")
 (match_operand:BLK 2 "memory_operand" "") ) )
(use (match_operand:SI 3 "general_operand" ""))
-   (use (match_operand:SI 4 "" ""))]
+   (use (match_operand:SI 4 "" ""))
+   (use (match_operand:SI 5 "" ""))]
   ""
 {
   if (s390_expand_cmpmem (operands[0], operands[1],
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index e01cdcbe22c..06955cd7e78 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6992,14 +6992,19 @@ result of the comparison.

 @cindex @code{cmpmem@var{m}} instruction pattern
 @item @samp{cmpmem@var{m}}
-Block compare instruction, with five operands like the operands
-of @samp{cmpstr@var{m}}.  The two memory blocks

[PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-20 Thread HAO CHEN GUI
Hi,
  This patch enables vector compare for 16-byte memory equality compare.
The 16-byte memory equality compare can be efficiently implemented by
instruction "vcmpequb." It reduces one branch and one compare compared
with two 8-byte compare sequence.

  16-byte vector compare is not enabled on 32bit sub-targets as TImode
hasn't been supported well on 32bit sub-targets.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: Enable vector compare for 16-byte memory equality compare

gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchti4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
sequence for TImode vector equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(COMPARE_MAX_PIECES): Define.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..99264235cbe 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,24 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+(define_expand "cbranchti4"
+  [(use (match_operator 0 "equality_operator"
+   [(match_operand:TI 1 "memory_operand")
+(match_operand:TI 2 "memory_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+{
+  rtx op1 = simplify_subreg (V16QImode, operands[1], TImode, 0);
+  rtx op2 = simplify_subreg (V16QImode, operands[2], TImode, 0);
+  operands[1] = force_reg (V16QImode, op1);
+  operands[2] = force_reg (V16QImode, op2);
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1],
+   operands[2]);
+  rs6000_emit_cbranch (TImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efe9adce1f8..c6b935a64e7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
  else
emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
}
+  else if (mode == TImode)
+   {
+ gcc_assert (code == EQ || code == NE);
+
+ rtx result_vector = gen_reg_rtx (V16QImode);
+ compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
+ emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+ code = (code == NE) ? GE : LT;
+   }
   else
emit_insn (gen_rtx_SET (compare_result,
gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..dc33bca0802 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,8 @@ typedef struct rs6000_args
in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
+#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)

 /* Nonzero if access to memory by bytes is no faster than for words.
Also nonzero if doing byte operations (specifically shifts) in registers
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449.c
new file mode 100644
index 000..ab9583f47bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Ensure vector comparison is used for 16-byte memory equality compare.  */
+
+int compare (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 16) == 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvcmpequb\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mcmpd\M} } } */


Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]

2023-09-24 Thread HAO CHEN GUI
Hi Kewen,

在 2023/9/18 15:34, Kewen.Lin 写道:
> hanks for checking!  So for P7, this patch looks neutral, but for P8 and
> later, it may cause some few differences in code gen.  I'm curious that how
> many total object files and different object files were checked and found
> on P8?  
P8 with -O2, following object files are different.
507.cactuBSSN_r datestamp.o
511.povray_r colutils.o
521.wrf_r module_cu_kfeta.fppized.o
526.blender_r particle_edit.o
526.blender_r glutil.o
526.blender_r displist.o
526.blender_r CCGSubSurf.o

P8 with -O3, following object files are different.
502.gcc_r ifcvt.o
502.gcc_r rtlanal.o
548.exchange2_r exchange2.fppized.o
507.cactuBSSN_r datestamp.o
511.povray_r colutils.o
521.wrf_r module_bc.fppized.o
521.wrf_r module_cu_kfeta.fppized.o
526.blender_r particle_edit.o
526.blender_r displist.o
526.blender_r CCGSubSurf.o
526.blender_r sketch.o




> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html
> I also wonder if it's easy to reduce some of them further as small test cases.
> 
> Since xxlor is better than fmr at least on Power10, could you also evaluate
> the affected bmks on P10 (even P8/P9) to ensure no performance degradation?
There is no performance recession on P10/P9/P8. The detail data is listed on
internal issue.

Thanks
Gui Haochen


[PATCH-2v3, rs6000] Implement 32bit inline lrint [PR88558]

2023-09-24 Thread HAO CHEN GUI
Hi,
  This patch implements 32bit inline lrint by "fctiw". It depends on
the patch1 to do SImode move from FP registers on P7.

  Compared to last version, the main change is to add some test cases.
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629187.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: support 32bit inline lrint

gcc/
PR target/88558
* config/rs6000/rs6000.md (lrintdi2): Remove TARGET_FPRND
from insn condition.
(lrintsi2): New insn pattern for 32bit lrint.

gcc/testsuite/
PR target/106769
* gcc.target/powerpc/pr88558.h: New.
* gcc.target/powerpc/pr88558-p7.c: New.
* gcc.target/powerpc/pr88558-p8.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ac5d29a2cf8..a41898e0e08 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6655,10 +6655,18 @@ (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
(unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

+(define_insn "lrintsi2"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
+   (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW))]
+  "TARGET_HARD_FLOAT && TARGET_POPCNTD"
+  "fctiw %0,%1"
+  [(set_attr "type" "fp")])
+
 (define_insn "btrunc2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c 
b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
new file mode 100644
index 000..3932656c5fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power7" } */
+
+/* -fno-math-errno is required to make {i,l,ll}rint{,f} inlined */
+
+#include "pr88558.h"
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mstfiwx\M} 4 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c 
b/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c
new file mode 100644
index 000..1afc8fd4f0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power8" } */
+
+/* -fno-math-errno is required to make {i,l,ll}rint{,f} inlined */
+
+#include "pr88558.h"
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 4 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558.h 
b/gcc/testsuite/gcc.target/powerpc/pr88558.h
new file mode 100644
index 000..9c604faadd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558.h
@@ -0,0 +1,29 @@
+long int test1 (double a)
+{
+  return __builtin_lrint (a);
+}
+
+long long test2 (double a)
+{
+  return __builtin_llrint (a);
+}
+
+int test3 (double a)
+{
+  return __builtin_irint (a);
+}
+
+long int test4 (float a)
+{
+  return __builtin_lrintf (a);
+}
+
+long long test5 (float a)
+{
+  return __builtin_llrintf (a);
+}
+
+int test6 (float a)
+{
+  return __builtin_irintf (a);
+}


Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-28 Thread HAO CHEN GUI
Kewen and Richard,
  Thanks for your comments. Please let me clarify it.

在 2023/9/27 19:10, Richard Sandiford 写道:
> Yeah, I agree there doesn't seem to be a good reason to exclude vectors.
> Sorry to dive straight into details, but maybe we should have something
> called bitwise_mode_for_size that tries to use integer modes where possible,
> but falls back to vector modes otherwise.  That mode could then be used
> for copying, storing, bitwise ops, and equality comparisons (if there
> is appropriate optabs support).

  The vector mode is not supported for compare_by_pieces and move_by_pieces.
But it is supported for set_by_pieces and clear_by_pieces. The help function
widest_fixed_size_mode_for_size returns vector mode when qi_vector is set to
true.

static fixed_size_mode
widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)

I tried to enable qi_vector for compare_by_pieces. It can pick up a vector
mode (eg. V16QImode) and works on some cases. But it fails on a constant
string case.

int compare (const char* s1)
{
  return __builtin_memcmp_eq (s1, "__GCC_HAVE_DWARF2_CFI_ASM", 16);
}

As the second op is a constant string, it calls builtin_memcpy_read_str to
build the string. Unfortunately, the inner function doesn't support
vector mode.

  /* The by-pieces infrastructure does not try to pick a vector mode
 for memcpy expansion.  */
  return c_readstr (rep + offset, as_a  (mode),
/*nul_terminated=*/false);

Seems by-pieces infrastructure itself supports vector mode, but low level
functions do not.

I think there are two ways enable vector mode for compare_by_pieces.
One is to modify the by-pieces infrastructure . Another is to enable it
by cmpmem expand. The expand is target specific and be flexible.

What's your opinion?

Thanks
Gui Haochen


[PATCH-1v4, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-06-26 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isinf for SFDF and IEEE128 by test
data class instructions.

  Compared with previous version, the main change is to define
and use the constant mask for test data class insns.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isinf for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/rs6000.md (ISNAN, ISINF, ISZERO, ISDENORMAL): Define.
* config/rs6000/vsx.md (isinf2 for SFDF): New expand.
(isinf2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-1.c: New test.
* gcc.target/powerpc/pr97786-2.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ac5651d7420..e84e6b08f03 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -53,6 +53,17 @@ (define_constants
(FRAME_POINTER_REGNUM   110)
   ])

+;;
+;; Test data class mask
+;;
+
+(define_constants
+  [(ISNAN  0x40)
+   (ISINF  0x30)
+   (ISZERO 0xC)
+   (ISDENORMAL 0x3)
+  ])
+
 ;;
 ;; UNSPEC usage
 ;;
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..67615bae8c0 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
   operands[4] = CONST0_RTX (SImode);
 })

+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (ISINF)));
+  DONE;
+})
+
+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (ISINF)));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
new file mode 100644
index 000..c1c4f64ee8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isinf (x);
+}
+
+int test3 (float x)
+{
+  return __builtin_isinff (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
new file mode 100644
index 000..ed305e8572e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


[PATCH-3v4, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-06-26 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isnormal for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is to use the
constant mask for test data class insns.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isnormal for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isnormal2 for SFDF): New expand.
(isnormal2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-7.c: New test.
* gcc.target/powerpc/pr97786-8.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 11d02e60170..b48986ac9eb 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5355,6 +5355,30 @@ (define_expand "isfinite2"
   DONE;
 })

+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  int mask = ISINF | ISNAN | ISZERO | ISDENORMAL;
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (mask)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  int mask = ISINF | ISNAN | ISZERO | ISDENORMAL;
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (mask)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
new file mode 100644
index 000..2df472e35d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isnormal (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
new file mode 100644
index 000..00478dbf3ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH-2v4, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-06-26 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isfinite for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is to use the
constant mask for test data class insns.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isfinite for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isfinite2 for SFDF): New expand.
(isfinite2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-4.c: New test.
* gcc.target/powerpc/pr97786-5.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 67615bae8c0..11d02e60170 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5331,6 +5331,30 @@ (define_expand "isinf2"
   DONE;
 })

+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  int mask = ISINF | ISNAN;
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (mask)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  int mask = ISINF | ISNAN;
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (mask)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
new file mode 100644
index 000..01faa962bd5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isfinite (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
new file mode 100644
index 000..0e106b9f23a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-07-01 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html

Thanks
Gui Haochen

在 2024/6/24 9:40, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/6/20 14:56, HAO CHEN GUI 写道:
>> Hi,
>>   Gently ping it.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
>>
>> Thanks
>> Gui Haochen
>>
>> 在 2024/5/30 10:46, HAO CHEN GUI 写道:
>>> Hi,
>>>   The builtin isinf is not folded at front end if the corresponding optab
>>> exists. It causes the range evaluation failed on the targets which has
>>> optab_isinf. For instance, range-sincos.c will fail on the targets which
>>> has optab_isinf as it calls builtin_isinf.
>>>
>>>   This patch fixed the problem by adding range op for builtin isinf.
>>>
>>>   Compared with previous version, the main change is to set the range to
>>> 1 if it's infinite number otherwise to 0.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html
>>>
>>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>>
>>> ChangeLog
>>> Value Range: Add range op for builtin isinf
>>>
>>> The builtin isinf is not folded at front end if the corresponding optab
>>> exists.  So the range op for isinf is needed for value range analysis.
>>> This patch adds range op for builtin isinf.
>>>
>>> gcc/
>>> * gimple-range-op.cc (class cfn_isinf): New.
>>> (op_cfn_isinf): New variables.
>>> (gimple_range_op_handler::maybe_builtin_call): Handle
>>> CASE_FLT_FN (BUILT_IN_ISINF).
>>>
>>> gcc/testsuite/
>>> * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.
>>>
>>> patch.diff
>>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>>> index 55dfbb23ce2..4e60a42eaac 100644
>>> --- a/gcc/gimple-range-op.cc
>>> +++ b/gcc/gimple-range-op.cc
>>> @@ -1175,6 +1175,63 @@ private:
>>>bool m_is_pos;
>>>  } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);
>>>
>>> +// Implement range operator for CFN_BUILT_IN_ISINF
>>> +class cfn_isinf : public range_operator
>>> +{
>>> +public:
>>> +  using range_operator::fold_range;
>>> +  using range_operator::op1_range;
>>> +  virtual bool fold_range (irange &r, tree type, const frange &op1,
>>> +  const irange &, relation_trio) const override
>>> +  {
>>> +if (op1.undefined_p ())
>>> +  return false;
>>> +
>>> +if (op1.known_isinf ())
>>> +  {
>>> +   wide_int one = wi::one (TYPE_PRECISION (type));
>>> +   r.set (type, one, one);
>>> +   return true;
>>> +  }
>>> +
>>> +if (op1.known_isnan ()
>>> +   || (!real_isinf (&op1.lower_bound ())
>>> +   && !real_isinf (&op1.upper_bound (
>>> +  {
>>> +   r.set_zero (type);
>>> +   return true;
>>> +  }
>>> +
>>> +r.set_varying (type);
>>> +return true;
>>> +  }
>>> +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
>>> + const frange &, relation_trio) const override
>>> +  {
>>> +if (lhs.undefined_p ())
>>> +  return false;
>>> +
>>> +if (lhs.zero_p ())
>>> +  {
>>> +   nan_state nan (true);
>>> +   r.set (type, real_min_representable (type),
>>> +  real_max_representable (type), nan);
>>> +   return true;
>>> +  }
>>> +
>>> +if (!range_includes_zero_p (lhs))
>>> +  {
>>> +   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
>>> +   // Set range to [-INF,+INF]
>>> +   r.set_varying (type);
>>> +   r.clear_nan ();
>>> +   return true;
>>> +  }
>>> +
>>> +r.set_varying (type);
>>> +return true;
>>> +  }
>>> +} op_cfn_isinf;
>>>
>>>  // Implement range operator for CFN_BUILT_IN_
>>>  class cfn_parity : public range_operator
>>> @@ -1268,6 +1325,11 @@ gimple_range_op_handler::maybe_builtin_call ()
>>>m_operator = &op_cfn_signbit;

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-07-01 Thread HAO CHEN GUI
The problem should be fixed after my value range patches being accepted.
[PATCH-1v3] Value Range: Add range op for builtin isinf
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
[PATCH-2v4] Value Range: Add range op for builtin isfinite
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
[PATCH-3v2] Value Range: Add range op for builtin isnormal
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html



在 2024/6/29 9:35, Vineet Gupta 写道:
> 
> 
> On 6/28/24 17:53, Vineet Gupta wrote:
>> Currently isfinite and isnormal use float compare instructions with fp
>> flags save/restored around them. Our perf team complained this could be
>> costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to
>> do FP compares w/o disturbing FP exception flags.
>>
>> Coincidently, upstream ijust few days back got support for the
>> corresponding optabs. All that is needed is to wire these up in the
>> backend.
>>
>> I was also hoping to get __builtin_inf() done but unforutnately it
>> requires little more rtl foo/bar to implement a tri-modal return.
>>
>> Currently going thru CI testing.
> 
> My local testing spotted one additional failure.
> 
> FAIL: g++.dg/opt/pr107569.C  -std=gnu++20  scan-tree-dump-times vrp1
> "return 1;" 2
> 
> The reason being
> 
> bool
> bar (double x)
> {
>   [[assume (std::isfinite (x))]];
>   return std::isfinite (x);
> }
> 
> generating the new seq
> 
> .LFB4:
>     fclass.d    a0,fa0
>     andi    a0,a0,126
>     snez    a0,a0
>     ret
> 
> vs.
> 
>     li    a0,1
>     ret
> 
> I have a hunch this requires the pending value range patch from Hao Chen
> GUI.
> 
> Thx,
> -Vineet
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html


Ping^3 [PATCH-3v2] Value Range: Add range op for builtin isnormal

2024-07-01 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html

Thanks
Gui Haochen

在 2024/6/24 9:41, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/6/20 14:58, HAO CHEN GUI 写道:
>> Hi,
>>   Gently ping it.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html
>>
>> Thanks
>> Gui Haochen
>>
>> 在 2024/5/30 10:46, HAO CHEN GUI 写道:
>>> Hi,
>>>   This patch adds the range op for builtin isnormal. It also adds two
>>> help function in frange to detect range of normal floating-point and
>>> range of subnormal or zero.
>>>
>>>   Compared to previous version, the main change is to set the range to
>>> 1 if it's normal number otherwise to 0.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html
>>>
>>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> Value Range: Add range op for builtin isnormal
>>>
>>> The former patch adds optab for builtin isnormal. Thus builtin isnormal
>>> might not be folded at front end.  So the range op for isnormal is needed
>>> for value range analysis.  This patch adds range op for builtin isnormal.
>>>
>>> gcc/
>>> * gimple-range-op.cc (class cfn_isfinite): New.
>>> (op_cfn_finite): New variables.
>>> (gimple_range_op_handler::maybe_builtin_call): Handle
>>> CFN_BUILT_IN_ISFINITE.
>>> * value-range.h (class frange): Declare known_isnormal and
>>> known_isdenormal_or_zero.
>>> (frange::known_isnormal): Define.
>>> (frange::known_isdenormal_or_zero): Define.
>>>
>>> gcc/testsuite/
>>> * gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test.
>>>
>>> patch.diff
>>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>>> index 5ec5c828fa4..6787f532f11 100644
>>> --- a/gcc/gimple-range-op.cc
>>> +++ b/gcc/gimple-range-op.cc
>>> @@ -1289,6 +1289,61 @@ public:
>>>}
>>>  } op_cfn_isfinite;
>>>
>>> +//Implement range operator for CFN_BUILT_IN_ISNORMAL
>>> +class cfn_isnormal :  public range_operator
>>> +{
>>> +public:
>>> +  using range_operator::fold_range;
>>> +  using range_operator::op1_range;
>>> +  virtual bool fold_range (irange &r, tree type, const frange &op1,
>>> +  const irange &, relation_trio) const override
>>> +  {
>>> +if (op1.undefined_p ())
>>> +  return false;
>>> +
>>> +if (op1.known_isnormal ())
>>> +  {
>>> +   wide_int one = wi::one (TYPE_PRECISION (type));
>>> +   r.set (type, one, one);
>>> +   return true;
>>> +  }
>>> +
>>> +if (op1.known_isnan ()
>>> +   || op1.known_isinf ()
>>> +   || op1.known_isdenormal_or_zero ())
>>> +  {
>>> +   r.set_zero (type);
>>> +   return true;
>>> +  }
>>> +
>>> +r.set_varying (type);
>>> +return true;
>>> +  }
>>> +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
>>> + const frange &, relation_trio) const override
>>> +  {
>>> +if (lhs.undefined_p ())
>>> +  return false;
>>> +
>>> +if (lhs.zero_p ())
>>> +  {
>>> +   r.set_varying (type);
>>> +   return true;
>>> +  }
>>> +
>>> +if (!range_includes_zero_p (lhs))
>>> +  {
>>> +   nan_state nan (false);
>>> +   r.set (type, real_min_representable (type),
>>> +  real_max_representable (type), nan);
>>> +   return true;
>>> +  }
>>> +
>>> +r.set_varying (type);
>>> +return true;
>>> +  }
>>> +} op_cfn_isnormal;
>>> +
>>>  // Implement range operator for CFN_BUILT_IN_
>>>  class cfn_parity : public range_operator
>>>  {
>>> @@ -1391,6 +1446,11 @@ gimple_range_op_handler::maybe_builtin_call ()
>>>m_operator = &op_cfn_isfinite;
>>>break;
>>>
>>> +case CFN_BUILT_IN_ISNORMAL:
>>> +  m_op1 = gimple_call_arg (call, 0);
>>> +  m_operator = &op_cf

Ping^3 [PATCH-2v4] Value Range: Add range op for builtin isfinite

2024-07-01 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html

Thanks
Gui Haochen

在 2024/6/24 9:41, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/6/20 14:57, HAO CHEN GUI 写道:
>> Hi,
>>   Gently ping it.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
>>
>> Thanks
>> Gui Haochen
>>
>> 在 2024/5/30 10:46, HAO CHEN GUI 写道:
>>> Hi,
>>>   This patch adds the range op for builtin isfinite.
>>>
>>>   Compared to previous version, the main change is to set the range to
>>> 1 if it's finite number otherwise to 0.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html
>>>
>>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> Value Range: Add range op for builtin isfinite
>>>
>>> The former patch adds optab for builtin isfinite. Thus builtin isfinite
>>> might not be folded at front end.  So the range op for isfinite is needed
>>> for value range analysis.  This patch adds range op for builtin isfinite.
>>>
>>> gcc/
>>> * gimple-range-op.cc (class cfn_isfinite): New.
>>> (op_cfn_finite): New variables.
>>> (gimple_range_op_handler::maybe_builtin_call): Handle
>>> CFN_BUILT_IN_ISFINITE.
>>>
>>> gcc/testsuite/
>>> * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.
>>>
>>> patch.diff
>>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>>> index 4e60a42eaac..5ec5c828fa4 100644
>>> --- a/gcc/gimple-range-op.cc
>>> +++ b/gcc/gimple-range-op.cc
>>> @@ -1233,6 +1233,62 @@ public:
>>>}
>>>  } op_cfn_isinf;
>>>
>>> +//Implement range operator for CFN_BUILT_IN_ISFINITE
>>> +class cfn_isfinite : public range_operator
>>> +{
>>> +public:
>>> +  using range_operator::fold_range;
>>> +  using range_operator::op1_range;
>>> +  virtual bool fold_range (irange &r, tree type, const frange &op1,
>>> +  const irange &, relation_trio) const override
>>> +  {
>>> +if (op1.undefined_p ())
>>> +  return false;
>>> +
>>> +if (op1.known_isfinite ())
>>> +  {
>>> +   wide_int one = wi::one (TYPE_PRECISION (type));
>>> +   r.set (type, one, one);
>>> +   return true;
>>> +  }
>>> +
>>> +if (op1.known_isnan ()
>>> +   || op1.known_isinf ())
>>> +  {
>>> +   r.set_zero (type);
>>> +   return true;
>>> +  }
>>> +
>>> +r.set_varying (type);
>>> +return true;
>>> +  }
>>> +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
>>> + const frange &, relation_trio) const override
>>> +  {
>>> +if (lhs.undefined_p ())
>>> +  return false;
>>> +
>>> +if (lhs.zero_p ())
>>> +  {
>>> +   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
>>> +   // Set range to varying
>>> +   r.set_varying (type);
>>> +   return true;
>>> +  }
>>> +
>>> +if (!range_includes_zero_p (lhs))
>>> +  {
>>> +   nan_state nan (false);
>>> +   r.set (type, real_min_representable (type),
>>> +  real_max_representable (type), nan);
>>> +   return true;
>>> +  }
>>> +
>>> +r.set_varying (type);
>>> +return true;
>>> +  }
>>> +} op_cfn_isfinite;
>>> +
>>>  // Implement range operator for CFN_BUILT_IN_
>>>  class cfn_parity : public range_operator
>>>  {
>>> @@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call ()
>>>m_operator = &op_cfn_isinf;
>>>break;
>>>
>>> +case CFN_BUILT_IN_ISFINITE:
>>> +  m_op1 = gimple_call_arg (call, 0);
>>> +  m_operator = &op_cfn_isfinite;
>>> +  break;
>>> +
>>>  CASE_CFN_COPYSIGN_ALL:
>>>m_op1 = gimple_call_arg (call, 0);
>>>m_op2 = gimple_call_arg (call, 1);
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
>>> b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
>>> new file mode 100644
>>> index 000..f5dce0a0486
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
>>> @@ -0,0 +1,31 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-evrp" } */
>>> +
>>> +#include 
>>> +void link_error();
>>> +
>>> +void test1 (double x)
>>> +{
>>> +  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
>>> +link_error ();
>>> +}
>>> +
>>> +void test2 (float x)
>>> +{
>>> +  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
>>> +link_error ();
>>> +}
>>> +
>>> +void test3 (double x)
>>> +{
>>> +  if (__builtin_isfinite (x) && __builtin_isinf (x))
>>> +link_error ();
>>> +}
>>> +
>>> +void test4 (float x)
>>> +{
>>> +  if (__builtin_isfinite (x) && __builtin_isinf (x))
>>> +link_error ();
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


Ping^2 [PATCHv2, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]

2024-07-01 Thread HAO CHEN GUI
Hi,
 Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653180.html

Thanks
Gui Haochen


在 2024/6/20 15:01, HAO CHEN GUI 写道:
> Hi,
>  Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653180.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/5/31 11:25, HAO CHEN GUI 写道:
>> Hi,
>>   This patch optimizes vector construction with two vector doubleword loads.
>> It generates an optimal insn sequence as "xxlor" has lower latency than
>> "mtvsrdd" on Power10.
>>
>>   Compared with previous version, the main change is to use "isa" attribute
>> to guard "lxsd" and "lxsdx".
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653103.html
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>> regressions. OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> rs6000: Optimize vector construction with two vector doubleword loads
>>
>> When constructing a vector by two doublewords from memory, originally it
>> does
>>  ld 10,0(3)
>>  ld 9,0(4)
>>  mtvsrdd 34,9,10
>>
>> An optimal sequence on Power10 should be
>>  lxsd 0,0(4)
>>  lxvrdx 1,0,3
>>  xxlor 34,1,32
>>
>> This patch does this optimization by insn combine and split.
>>
>> gcc/
>>  PR target/103568
>>  * config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn
>>  pattern.
>>  (vsx_ld_highpart_zero_): New insn pattern.
>>  (vsx_concat_mem_): New insn_and_split pattern.
>>
>> gcc/testsuite/
>>  PR target/103568
>>  * gcc.target/powerpc/pr103568.c: New test.
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index f135fa079bd..f9a2a260e89 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -1395,6 +1395,27 @@ (define_insn "vsx_ld_elemrev_v2di"
>>"lxvd2x %x0,%y1"
>>[(set_attr "type" "vecload")])
>>
>> +(define_insn "vsx_ld_lowpart_zero_"
>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
>> +(vec_concat:VSX_D
>> +  (match_operand: 1 "memory_operand" "wY,Z")
>> +  (match_operand: 2 "zero_constant" "j,j")))]
>> +  ""
>> +  "@
>> +   lxsd %0,%1
>> +   lxsdx %x0,%y1"
>> +  [(set_attr "type" "vecload,vecload")
>> +   (set_attr "isa" "p9v,p7v")])
>> +
>> +(define_insn "vsx_ld_highpart_zero_"
>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
>> +(vec_concat:VSX_D
>> +  (match_operand: 1 "zero_constant" "j")
>> +  (match_operand: 2 "memory_operand" "Z")))]
>> +  "TARGET_POWER10"
>> +  "lxvrdx %x0,%y2"
>> +  [(set_attr "type" "vecload")])
>> +
>>  (define_insn "vsx_ld_elemrev_v1ti"
>>[(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
>>  (vec_select:V1TI
>> @@ -3063,6 +3084,26 @@ (define_insn "vsx_concat_"
>>  }
>>[(set_attr "type" "vecperm,vecmove")])
>>
>> +(define_insn_and_split "vsx_concat_mem_"
>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
>> +(vec_concat:VSX_D
>> +  (match_operand: 1 "memory_operand" "wY,Z")
>> +  (match_operand: 2 "memory_operand" "Z,Z")))]
>> +  "TARGET_POWER10 && can_create_pseudo_p ()"
>> +  "#"
>> +  "&& 1"
>> +  [(const_int 0)]
>> +{
>> +  rtx tmp1 = gen_reg_rtx (mode);
>> +  rtx tmp2 = gen_reg_rtx (mode);
>> +  emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX 
>> (mode),
>> +  operands[1]));
>> +  emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2],
>> + CONST0_RTX (mode)));
>> +  emit_insn (gen_ior3 (operands[0], tmp1, tmp2));
>> +  DONE;
>> +})
>> +
>>  ;; Combiner patterns to allow creating XXPERMDI's to access either double
>>  ;; word element in a vector register.
>>  (define_insn "*vsx_concat__1"
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr103568.c
>> new file mode 100644
>> index 000..b2a06fb2162
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
>> +
>> +vector double test (double *a, double *b)
>> +{
>> +  return (vector double) {*a, *b};
>> +}
>> +
>> +vector long long test1 (long long *a, long long *b)
>> +{
>> +  return (vector long long) {*a, *b};
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */
>> +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */
>> +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */
>> +


[PATCH-1v5, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-07-09 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isinf for SFDF and IEEE128 by test
data class instructions.

  Compared with previous version, the main changes are:
1 Define 3 mode attributes which are used for predicate, constraint
and asm print selection. They help merge sp/dp/qp patterns to one.
2 Remove original sp/dp and qp patterns and combine them into one.
3 Rename corresponding icode name in rs6000-builtin.cc and
rs6000-builtins.def.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655779.html

  The expand "isinf2" and following insn pattern for TF and
KF mode should be guarded on "TARGET_FLOAT128_HW". It will be
changed in sequential patch as some other "qp" insn patterns are
also need to be changed.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isinf for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/rs6000.md (constant VSX_TEST_DATA_CLASS_NAN,
VSX_TEST_DATA_CLASS_POS_INF, VSX_TEST_DATA_CLASS_NEG_INF,
VSX_TEST_DATA_CLASS_POS_ZERO, VSX_TEST_DATA_CLASS_NEG_ZERO,
VSX_TEST_DATA_CLASS_POS_DENORMAL, VSX_TEST_DATA_CLASS_NEG_DENORMAL):
Define.
(mode_attr sdq, vsx_altivec, wa_v, x): Define.
(mode_iterator IEEE_FP): Define.
* config/rs6000/vsx.md (isinf2): New expand.
(expand xststdcqp_, xststdcp): Combine into...
(expand xststdc_): ...this.
(insn *xststdcqp_, *xststdcp): Combine into...
(insn *xststdc_): ...this.
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Rename
CODE_FOR_xststdcqp_kf as CODE_FOR_xststdc_kf,
CODE_FOR_xststdcqp_tf as CODE_FOR_xststdc_tf.
* config/rs6000/rs6000-builtins.def: Rename xststdcdp as xststdc_df,
xststdcsp as xststdc_sf, xststdcqp_kf as xststdc_kf.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-1.c: New test.
* gcc.target/powerpc/pr97786-2.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index bb9da68edc7..a62a5d4afa7 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -3357,8 +3357,8 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
subtarget */,
   case CODE_FOR_xsiexpqpf_kf:
icode = CODE_FOR_xsiexpqpf_tf;
break;
-  case CODE_FOR_xststdcqp_kf:
-   icode = CODE_FOR_xststdcqp_tf;
+  case CODE_FOR_xststdc_kf:
+   icode = CODE_FOR_xststdc_tf;
break;
   case CODE_FOR_xscmpexpqp_eq_kf:
icode = CODE_FOR_xscmpexpqp_eq_tf;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..8ac4cc200c9 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2752,11 +2752,11 @@

   const signed int \
   __builtin_vsx_scalar_test_data_class_dp (double, const int<7>);
-VSTDCDP xststdcdp {}
+VSTDCDP xststdc_df {}

   const signed int \
   __builtin_vsx_scalar_test_data_class_sp (float, const int<7>);
-VSTDCSP xststdcsp {}
+VSTDCSP xststdc_sf {}

   const signed int __builtin_vsx_scalar_test_neg_dp (double);
 VSTDCNDP xststdcnegdp {}
@@ -2925,7 +2925,7 @@

   const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, \
 const int<7>);
-VSTDCQP xststdcqp_kf {}
+VSTDCQP xststdc_kf {}

   const signed int __builtin_vsx_scalar_test_neg_qp (_Float128);
 VSTDCNQP xststdcnegqp_kf {}
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a5d20594789..2d7f227e362 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -53,6 +53,20 @@ (define_constants
(FRAME_POINTER_REGNUM   110)
   ])

+;;
+;; Test data class mask
+;;
+
+(define_constants
+  [(VSX_TEST_DATA_CLASS_NAN0x40)
+   (VSX_TEST_DATA_CLASS_POS_INF0x20)
+   (VSX_TEST_DATA_CLASS_NEG_INF0x10)
+   (VSX_TEST_DATA_CLASS_POS_ZERO   0x8)
+   (VSX_TEST_DATA_CLASS_NEG_ZERO   0x4)
+   (VSX_TEST_DATA_CLASS_POS_DENORMAL   0x2)
+   (VSX_TEST_DATA_CLASS_NEG_DENORMAL   0x1)
+  ])
+
 ;;
 ;; UNSPEC usage
 ;;
@@ -605,6 +619,24 @@ (define_mode_iterator SFDF2 [SF DF])
 (define_mode_attr sd [(SF   "s") (DF   "d")
  (V4SF "s") (V2DF "d")])

+; A generic s/d/q attribute, for sp/dp/qp for example.
+(define_mode_attr sdq [(SF "s") (DF "d")
+  (TF "q") (KF "q")])
+
+; A predicate attribute, for IEEE floating point
+(define_mode_attr vsx_altivec [(SF "vsx_register_operand")
+  (DF "vsx_register_operand")
+  (TF "altivec_register_operand")
+  (KF "altivec_register_operand")])
+
+; A constraint attribute, for IEEE floating point
+(define_mode_attr wa_v [(SF "wa") (DF "wa")
+  

Re: [PATCH v2] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-07-09 Thread HAO CHEN GUI
Hi,

在 2024/7/10 8:04, Vineet Gupta 写道:
> So it seems initial versions of the patch didn't specify anything about
> output mode. Richi asked for it in review and in v4 Hao added it.
> But I don't see anyone asking specifically for SImode.
> I guess that can be relaxed. Hao do you have any inputs here ?

The reviewer asked me to add the mode info for operands. So I picked up
SImode for the output in order to make it consistent with the function
definition in math.h. You can submit the patch to update the document if
you want to use other modes.

Thanks
Gui Haochen


Re: [PATCH] Expand: Pass down equality only flag to cmpmem expand

2024-07-09 Thread HAO CHEN GUI
Hi Jeff,

在 2024/7/10 7:35, Jeff Law 写道:
> Is this patch still relevant?  It was submitted after stage1 closed for 
> gcc-14.  With the trunk open for development, you should probably rebase and 
> repost if the patch is still relevant/useful.
> 
> Conceptually knowing that we just want to do an equality comparison seems 
> useful.  I think there are other places where we track this information and 
> utilize it to improve initial code generation.

The patch and its sequential patches are suspending as I am working
on other issues. I will come back after completing the task at hand.

Thanks
Gui Haochen


[PATCH, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-10 Thread HAO CHEN GUI
Hi,
  This patch adds TARGET_FLOAT128_HW into pattern conditions for quad-
precision insns. Also it removes FLOAT128_IEEE_P check from pattern
conditions if the mode of pattern is IEEE128 as the mode iterator -
IEEE128 already checks with FLOAT128_IEEE_P.

  For test case float128-cmp2-runnable.c, it should be guarded with
ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with
ppc_float128_hw, so it's removed.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns

gcc/
* config/rs6000/rs6000.md (*fpmask, floatdidf2, floatti2,
floatunsti2, fix_truncti2): Add guard
TARGET_FLOAT128_HW.
(add3, sub3, mul3, div3, sqrt2,
copysign3_hard, copysign3_soft, @neg2_hw,
@abs2_hw, *nabs2_hw, fma4_hw, *fms4_hw,
*nfma4_hw, *nfms4_hw,
extend2_hw, truncdf2_hw,
truncsf2_hw, fix_trunc2,
*fix_trunc2_mem,
float_si2_hw, floatuns_di2_hw, floor2,
ceil2, btrunc2, round2, add3_odd,
sub3_odd, mul3_odd, div3_odd, sqrt2_odd,
fma4_odd, *fms4_odd, *nfma4_odd,
*nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128):
Remove guard FLOAT128_IEEE_P.
* config/rs6000/vsx.md (xsxexpqp__,
xsxsigqp__, xsiexpqpf_,
xsiexpqp__, xscmpexpqp__,
*xscmpexpqp, xststdcnegqp_): Add guard TARGET_FLOAT128_HW.
(xststdc_, *xststdc_, xststdc_): Add guard
TARGET_FLOAT128_HW for the IEEE128 mode.

gcc/testsuite/
* testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace
ppc_float128_sw with ppc_float128_hw and remove p9vector_hw.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3ec5ffa3578..32e5f1c4c56 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5820,7 +5820,7 @@ (define_insn "*fpmask"
 (match_operand:IEEE128 3 "altivec_register_operand" "v")])
 (match_operand:V2DI 4 "all_ones_constant" "")
 (match_operand:V2DI 5 "zero_constant" "")))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "xscmp%V1qp %0,%2,%3"
   [(set_attr "type" "fpcompare")])

@@ -6928,7 +6928,7 @@ (define_insn "floatdidf2"
 (define_insn "floatti2"
   [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
(float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvsqqp %0,%1";
 }
@@ -6937,7 +6937,7 @@ (define_insn "floatti2"
 (define_insn "floatunsti2"
   [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
(unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" 
"v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvuqqp %0,%1";
 }
@@ -6946,7 +6946,7 @@ (define_insn "floatunsti2"
 (define_insn "fix_truncti2"
   [(set (match_operand:TI 0 "vsx_register_operand" "=v")
(fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvqpsqz %0,%1";
 }
@@ -6955,7 +6955,7 @@ (define_insn "fix_truncti2"
 (define_insn "fixuns_truncti2"
   [(set (match_operand:TI 0 "vsx_register_operand" "=v")
(unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvqpuqz %0,%1";
 }
@@ -15020,7 +15020,7 @@ (define_insn "add3"
(plus:IEEE128
 (match_operand:IEEE128 1 "altivec_register_operand" "v")
 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
-  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_FLOAT128_HW"
   "xsaddqp %0,%1,%2"
   [(set_attr "type" "vecfloat")
(set_attr "size" "128")])
@@ -15030,7 +15030,7 @@ (define_insn "sub3"
(minus:IEEE128
 (match_operand:IEEE128 1 "altivec_register_operand" "v")
 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
-  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_FLOAT128_HW"
   "xssubqp %0,%1,%2"
   [(set_attr "type" "vecfloat")
(set_attr "size" "128")])
@@ -15040,7 +15040,7 @@ (define_insn "mul3"
(mult:IEEE128
 (match_operand:IEEE128 1 "altivec_register_operand" "v")
 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
-  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_FLOAT128_HW"
   "xsmulqp %0,%1,%2"
   [(set_attr "type" "qmul")
(set_attr "size" "128")])
@@ -15050,7 +15050,7 @@ (define_insn "div3"
(div:IEEE128
 (match_operand:IEEE128 1 "altivec_register_operand" "v")
 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
-  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_FLOAT128_HW"
   "xsdivqp %0,%1,%2"
   [(set_attr "type" "vecdiv")
(set_attr "size" "

[PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf. It
also fixed the issue in PR114678.

  Compared with previous version, the main change is to remove xfail for
s390 in range-sincos.c and vrp-float-abs-1.c.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
PR target/114678
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
PR target/114678
* gcc.dg/tree-ssa/range-isinf.c: New test.
* gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390.
* gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index a80b93cf063..24559951dd6 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1153,6 +1153,63 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (&op1.lower_bound ())
+   && !real_isinf (&op1.upper_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1246,6 +1303,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
index 35b38c3c914..337f9cda02f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range

Re: Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI
Hi Ruoyao,
  Thanks for your info. I updated my patch and sent it for review.

Thanks
Gui Haochen

在 2024/7/10 22:01, Xi Ruoyao 写道:
> On Wed, 2024-07-10 at 21:54 +0800, Xi Ruoyao wrote:
>> On Mon, 2024-07-01 at 09:11 +0800, HAO CHEN GUI wrote:
>>> Hi,
>>>   Gently ping it.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
>>
>> I guess you can add PR114678 into the subject and the ChangeLog, and
>> also mention the patch in the bugzilla.
> 
> And, remove xfail in vrp-float-abs-1.c and range-sincos.c (if this patch
> works as intended they should no longer fail).
> 


Re: [PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-07-11 Thread HAO CHEN GUI
Hi Jeff,

在 2024/7/11 6:25, Jeff Law 写道:
> OK.  But given this patch is several months old, can you re-bootstrap & test 
> before committing to the trunk.

Thanks. I will rebase the patch and test it again.

Thanks
Gui Haochen


Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI
Hi Jeff,
  Thanks for your comments.

在 2024/7/12 6:13, Jeff Law 写道:
> 
> 
> On 7/11/24 1:32 AM, HAO CHEN GUI wrote:
>> Hi,
>>    The builtin isinf is not folded at front end if the corresponding optab
>> exists. It causes the range evaluation failed on the targets which has
>> optab_isinf. For instance, range-sincos.c will fail on the targets which
>> has optab_isinf as it calls builtin_isinf.
>>
>>    This patch fixed the problem by adding range op for builtin isinf. It
>> also fixed the issue in PR114678.
>>
>>    Compared with previous version, the main change is to remove xfail for
>> s390 in range-sincos.c and vrp-float-abs-1.c.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
>>
>>    Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> Value Range: Add range op for builtin isinf
>>
>> The builtin isinf is not folded at front end if the corresponding optab
>> exists.  So the range op for isinf is needed for value range analysis.
>> This patch adds range op for builtin isinf.
>>
>> gcc/
>> PR target/114678
>> * gimple-range-op.cc (class cfn_isinf): New.
>> (op_cfn_isinf): New variables.
>> (gimple_range_op_handler::maybe_builtin_call): Handle
>> CASE_FLT_FN (BUILT_IN_ISINF).
>>
>> gcc/testsuite/
>> PR target/114678
>> * gcc.dg/tree-ssa/range-isinf.c: New test.
>> * gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390.
>> * gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise.
>>
>> patch.diff
>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>> index a80b93cf063..24559951dd6 100644
>> --- a/gcc/gimple-range-op.cc
>> +++ b/gcc/gimple-range-op.cc
>> @@ -1153,6 +1153,63 @@ private:
>>     bool m_is_pos;
>>   } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);
>>
>> +// Implement range operator for CFN_BUILT_IN_ISINF
>> +class cfn_isinf : public range_operator
>> +{
>> +public:
>> +  using range_operator::fold_range;
>> +  using range_operator::op1_range;
>> +  virtual bool fold_range (irange &r, tree type, const frange &op1,
>> +   const irange &, relation_trio) const override
>> +  {
>> +    if (op1.undefined_p ())
>> +  return false;
>> +
>> +    if (op1.known_isinf ())
>> +  {
>> +    wide_int one = wi::one (TYPE_PRECISION (type));
>> +    r.set (type, one, one);
>> +    return true;
>> +  }
>> +
>> +    if (op1.known_isnan ()
>> +    || (!real_isinf (&op1.lower_bound ())
>> +    && !real_isinf (&op1.upper_bound (
>> +  {
>> +    r.set_zero (type);
>> +    return true;
>> +  }
> So why the test for real_isinf on the upper/lower bound?  If op1 is known to 
> be a NaN, then why test the bounds at all?  If a bounds test is needed, why 
> only test the upper bound?
> 
IMHO, logical is if the op1 is a NAN, it's not an infinite number. If the upper
and lower bound both are finite numbers, the op1 is not an infinite number.
Under both situations, the result should be set to 0 which means op1 isn't an
infinite number.
> 
>> +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
>> +  const frange &, relation_trio) const override
>> +  {
>> +    if (lhs.undefined_p ())
>> +  return false;
>> +
>> +    if (lhs.zero_p ())
>> +  {
>> +    nan_state nan (true);
>> +    r.set (type, real_min_representable (type),
>> +   real_max_representable (type), nan);
>> +    return true;
>> +  }
> If the result of a builtin_isinf is zero, that doesn't mean the input has a 
> nan state.  It means we know it's not infinity.  The input argument could be 
> anything but an Inf.
> 
If the result of a builtin_isinf is zero, it means the input might be a NAN or
a finite number. So the range should be [min_rep, max_rep] U NAN.

Looking forward to your further comments.

Thanks
Gui Haochen
> 
> Jeff


[PATCHv2, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-14 Thread HAO CHEN GUI
Hi,
  This patch adds TARGET_FLOAT128_HW into pattern conditions for quad-
precision insns. Some qp patterns are guarded by TARGET_P9_VECTOR
originally, so replace it with "TARGET_FLOAT128_HW".

  For test case float128-cmp2-runnable.c, it should be guarded with
ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with
ppc_float128_hw, so it's removed.

  Compared to previous version, the main change it to split redundant
FLOAT128_IEEE_P removal to another patch.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns

gcc/
* config/rs6000/rs6000.md (floatti2, floatunsti2,
fix_truncti2): Add guard TARGET_FLOAT128_HW.
* config/rs6000/vsx.md (xsxexpqp__,
xsxsigqp__, xsiexpqpf_,
xsiexpqp__, xscmpexpqp__,
*xscmpexpqp, xststdcnegqp_): Replace guard TARGET_P9_VECTOR
with TARGET_FLOAT128_HW.
(xststdc_, *xststdc_, isinf2): Add guard
TARGET_FLOAT128_HW for the IEEE128 modes.

gcc/testsuite/
* testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace
ppc_float128_sw with ppc_float128_hw and remove p9vector_hw.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index deffc4b601c..c0f6599c08b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6928,7 +6928,7 @@ (define_insn "floatdidf2"
 (define_insn "floatti2"
   [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
(float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvsqqp %0,%1";
 }
@@ -6937,7 +6937,7 @@ (define_insn "floatti2"
 (define_insn "floatunsti2"
   [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
(unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" 
"v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvuqqp %0,%1";
 }
@@ -6946,7 +6946,7 @@ (define_insn "floatunsti2"
 (define_insn "fix_truncti2"
   [(set (match_operand:TI 0 "vsx_register_operand" "=v")
(fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvqpsqz %0,%1";
 }
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 1272f8b2080..7dd08895bec 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5157,7 +5157,7 @@ (define_insn "xsxexpqp__"
(unspec:V2DI_DI
  [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
 UNSPEC_VSX_SXEXPDP))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsxexpqp %0,%1"
   [(set_attr "type" "vecmove")])

@@ -5176,7 +5176,7 @@ (define_insn "xsxsigqp__"
(unspec:VEC_TI [(match_operand:IEEE128 1
"altivec_register_operand" "v")]
 UNSPEC_VSX_SXSIG))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsxsigqp %0,%1"
   [(set_attr "type" "vecmove")])

@@ -5196,7 +5196,7 @@ (define_insn "xsiexpqpf_"
 [(match_operand:IEEE128 1 "altivec_register_operand" "v")
  (match_operand:DI 2 "altivec_register_operand" "v")]
 UNSPEC_VSX_SIEXPQP))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsiexpqp %0,%1,%2"
   [(set_attr "type" "vecmove")])

@@ -5208,7 +5208,7 @@ (define_insn "xsiexpqp__"
 (match_operand:V2DI_DI 2
  "altivec_register_operand" "v")]
 UNSPEC_VSX_SIEXPQP))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsiexpqp %0,%1,%2"
   [(set_attr "type" "vecmove")])

@@ -5278,7 +5278,7 @@ (define_expand "xscmpexpqp__"
(set (match_operand:SI 0 "register_operand" "=r")
(CMP_TEST:SI (match_dup 3)
 (const_int 0)))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
 {
   if ( == UNORDERED && !HONOR_NANS (mode))
 {
@@ -5296,7 +5296,7 @@ (define_insn "*xscmpexpqp"
  (match_operand:IEEE128 2 "altivec_register_operand" 
"v")]
  UNSPEC_VSX_SCMPEXPQP)
 (match_operand:SI 3 "zero_constant" "j")))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xscmpexpqp %0,%1,%2"
   [(set_attr "type" "fpcompare")])

@@ -5315,7 +5315,8 @@ (define_expand "xststdc_"
(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_dup 3)
   (const_int 0)))]
-  "TARGET_P9_VECTOR"
+  "TARGET_P9_VECTOR
+   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
 {
   operands[3] = gen_reg_rtx (CCFPmode);
   operands[4] = CONST0_RTX (SImode);
@@ -5324,7 +5325,8 @@ (define_expand "xststdc_"
 (define_expand "isinf2"
   [(use (match_operand:SI 0 "gpc_reg_operand"))
(use (match_operand:IEEE_FP 1 ""))]
-  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+  "TARGET_P9_VECTOR
+   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
 {
   int mask = VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_

[PATCH, rs6000] Remove redundant guard for float128 mode patterns

2024-07-14 Thread HAO CHEN GUI
Hi,
  This patch removes FLOAT128_IEEE_P guard when the mode of pattern
is IEEE128 and FLOAT128_IBM_P when the mode of pattern is IBM128.
The mode iterators already do the checking. So they're redundant.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Remove redundant guard for float128 mode patterns

gcc/
* config/rs6000/rs6000.md (movcc, *movcc_p10,
*movcc_invert_p10, *fpmask, *xxsel,
@ieee_128bit_vsx_abs2, *ieee_128bit_vsx_nabs2,
add3, sub3, mul3, div3, sqrt2,
copysign3, copysign3_hard, copysign3_soft,
@neg2_hw, @abs2_hw, *nabs2_hw, fma4_hw,
*fms4_hw, *nfma4_hw, *nfms4_hw,
extend2_hw, truncdf2_hw,
truncsf2_hw, fix_2_hw,
fix_trunc2,
*fix_trunc2_mem,
float_di2_hw, float_si2_hw,
float2, floatuns_di2_hw,
floatuns_si2_hw, floatuns2,
floor2, ceil2, btrunc2, round2,
add3_odd, sub3_odd, mul3_odd, div3_odd,
sqrt2_odd, fma4_odd, *fms4_odd, *nfma4_odd,
*nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128):
Remove guard FLOAT128_IEEE_P.
(@extenddf2_fprs, @extenddf2_vsx,
truncdf2_internal1, truncdf2_internal2,
fix_trunc_helper, neg2, *cmp_internal1,
*cmp_internal2 for IBM128): Remove guard FLOAT128_IBM_P.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c0f6599c08b..f22b7ed6256 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5736,7 +5736,7 @@ (define_expand "movcc"
 (if_then_else:IEEE128 (match_operand 1 "comparison_operator")
   (match_operand:IEEE128 2 "gpc_reg_operand")
   (match_operand:IEEE128 3 "gpc_reg_operand")))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
 DONE;
@@ -5753,7 +5753,7 @@ (define_insn_and_split "*movcc_p10"
 (match_operand:IEEE128 4 "altivec_register_operand" "v,v")
 (match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
(clobber (match_scratch:V2DI 6 "=0,&v"))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "#"
   "&& 1"
   [(set (match_dup 6)
@@ -5785,7 +5785,7 @@ (define_insn_and_split "*movcc_invert_p10"
 (match_operand:IEEE128 4 "altivec_register_operand" "v,v")
 (match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
(clobber (match_scratch:V2DI 6 "=0,&v"))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "#"
   "&& 1"
   [(set (match_dup 6)
@@ -5820,7 +5820,7 @@ (define_insn "*fpmask"
 (match_operand:IEEE128 3 "altivec_register_operand" "v")])
 (match_operand:V2DI 4 "all_ones_constant" "")
 (match_operand:V2DI 5 "zero_constant" "")))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "xscmp%V1qp %0,%2,%3"
   [(set_attr "type" "fpcompare")])

@@ -5831,7 +5831,7 @@ (define_insn "*xxsel"
 (match_operand:V2DI 2 "zero_constant" ""))
 (match_operand:IEEE128 3 "altivec_register_operand" "v")
 (match_operand:IEEE128 4 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "xxsel %x0,%x4,%x3,%x1"
   [(set_attr "type" "vecmove")])

@@ -8904,7 +8904,7 @@ (define_insn_and_split "@extenddf2_fprs"
 (match_operand:DF 1 "nonimmediate_operand" "d,m,d")))
(use (match_operand:DF 2 "nonimmediate_operand" "m,m,d"))]
   "!TARGET_VSX && TARGET_HARD_FLOAT
-   && TARGET_LONG_DOUBLE_128 && FLOAT128_IBM_P (mode)"
+   && TARGET_LONG_DOUBLE_128"
   "#"
   "&& reload_completed"
   [(set (match_dup 3) (match_dup 1))
@@ -8921,7 +8921,7 @@ (define_insn_and_split "@extenddf2_vsx"
   [(set (match_operand:IBM128 0 "gpc_reg_operand" "=d,d")
(float_extend:IBM128
 (match_operand:DF 1 "nonimmediate_operand" "wa,m")))]
-  "TARGET_LONG_DOUBLE_128 && TARGET_VSX && FLOAT128_IBM_P (mode)"
+  "TARGET_LONG_DOUBLE_128 && TARGET_VSX"
   "#"
   "&& reload_completed"
   [(set (match_dup 2) (match_dup 1))
@@ -8967,7 +8967,7 @@ (define_insn_and_split "truncdf2_internal1"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d,?d")
(float_truncate:DF
 (match_operand:IBM128 1 "gpc_reg_operand" "0,d")))]
-  "FLOAT128_IBM_P (mode) && !TARGET_XL_COMPAT
+  "!TARGET_XL_COMPAT
&& TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
   "@
#
@@ -8983,7 +8983,7 @@ (define_insn_and_split "truncdf2_internal1"
 (define_insn "truncdf2_internal2"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
(float_truncate:DF (match_operand:IBM128 1 "gpc_reg_operand" "d")))]
-  "FLOAT12

[PATCH-2v5, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-07-17 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isfinite for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is to merge
the patterns of SFDF and IEEE128 into one.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655780.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isfinite for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isfinite2): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-4.c: New test.
* gcc.target/powerpc/pr97786-5.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index d30416a53e7..763cd916c8d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5304,6 +5304,20 @@ (define_expand "isinf2"
   DONE;
 })

+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE_FP 1 ""))]
+  "TARGET_P9_VECTOR
+   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  int mask = VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_CLASS_NEG_INF
+| VSX_TEST_DATA_CLASS_NAN;
+  emit_insn (gen_xststdc_ (tmp, operands[1], GEN_INT (mask)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
new file mode 100644
index 000..9cdde78257d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+int test1 (double x)
+{
+  return __builtin_isfinite (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
new file mode 100644
index 000..0ef8b86f6cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH-3v5, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-07-17 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isnormal for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is to merge
the patterns of SFDF and IEEE128 into one.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655781.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isnormal for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isnormal2): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-7.c: New test.
* gcc.target/powerpc/pr97786-8.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 763cd916c8d..f818aba9e3e 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5318,6 +5318,23 @@ (define_expand "isfinite2"
   DONE;
 })

+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE_FP 1 ""))]
+  "TARGET_P9_VECTOR
+   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  int mask = VSX_TEST_DATA_CLASS_NAN
+| VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_CLASS_NEG_INF
+| VSX_TEST_DATA_CLASS_POS_ZERO | VSX_TEST_DATA_CLASS_NEG_ZERO
+| VSX_TEST_DATA_CLASS_POS_DENORMAL
+| VSX_TEST_DATA_CLASS_NEG_DENORMAL;
+  emit_insn (gen_xststdc_ (tmp, operands[1], GEN_INT (mask)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
new file mode 100644
index 000..eb01eed39d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+int test1 (double x)
+{
+  return __builtin_isnormal (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
new file mode 100644
index 000..eba90d3b1b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+int test1 (long double x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Mikael,

  Thanks for your comments.

在 2024/5/9 16:03, Mikael Morin 写道:
> I think the canonical API behaviour sets R to varying and returns true 
> instead of just returning false if nothing is known about the range.
> 
> I'm not sure whether it makes any difference; Aldy can probably tell. But if 
> the type is bool, varying is [0,1] which is better than unknown range.

Should the varying be set by caller when fold_range returns false?
Just like following codes in value-query.cc.

  if (!op.fold_range (r, type, r0, r1))
r.set_varying (type);

Thanks
Gui Haochen


Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Jakub,
  Thanks for your review comments.

在 2024/5/14 23:57, Jakub Jelinek 写道:
> BUILT_IN_ISFINITE is just one of many BUILT_IN_IS... builtins,
> would be nice to handle the others as well.
> 
> E.g. isnormal/isnan/isinf, fpclassify etc.
> 
Yes, I already sent the patches which add range op for isnormal/isnan/isinf
for review. I will modify them according to review comments and submit them
again.

> Note, the man page says for e.g. isnormal that it returns nonzero or zero,
> but in reality I think we implement it always inline and can check if
> it always returns [0,1].
> Some others like isinf return [-1,1] though I think and fpclassify
> returns union of all the passed int values.

The gcc inline code always returns 0 or 1 for isnormal/isnan/isinf. But I
wonder if all targets' expand can promise it. The rs6000 has an instruction
for isnormal/isnan/isinf. So we're making the patch not to call inline codes
and expand them by ourselves. Though rs6000 instruction returns 0 or 1 for
them, not sure if other targets are the same.

Thanks
Gui Haochen



Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Andrew,
  Thanks so much for your explanation. I got it. I will address the issue.

Thanks
Gui Haochen

在 2024/5/15 2:45, Andrew MacLeod 写道:
> 
> On 5/9/24 04:47, HAO CHEN GUI wrote:
>> Hi Mikael,
>>
>>    Thanks for your comments.
>>
>> 在 2024/5/9 16:03, Mikael Morin 写道:
>>> I think the canonical API behaviour sets R to varying and returns true 
>>> instead of just returning false if nothing is known about the range.
>>>
>>> I'm not sure whether it makes any difference; Aldy can probably tell. But 
>>> if the type is bool, varying is [0,1] which is better than unknown range.
>> Should the varying be set by caller when fold_range returns false?
>> Just like following codes in value-query.cc.
>>
>>    if (!op.fold_range (r, type, r0, r1))
>>  r.set_varying (type);
>>
> This would be dangerous in the general case.  fold_range may have returned 
> false because 'type' is an unsupported range type. Generally this is why we 
> prefer range-ops to return TRUE and VARYING rather than FALSE for unknown 
> values.   When FALSE is returned, we should stop working with ranges because 
> something is amok.
> 
> Andrew
> 


Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread HAO CHEN GUI
Hi Segher,
  Thanks for your review comments. I will modify it and resend. Just
one question on the insn condition.

在 2024/5/17 1:25, Segher Boessenkool 写道:
>> +(define_expand "isnormal2"
>> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
>> +(use (match_operand:SFDF 1 "gpc_reg_operand"))]
>> +  "TARGET_HARD_FLOAT
>> +   && TARGET_P9_VECTOR"
> Please put the condition on just one line if it is as simple and short
> as this.
> 
> Why is TARGET_P9_VECTOR the correct condition?

This expand calls gen_xststdcp which is a P9 vector instruction and
relies on "TARGET_P9_VECTOR". So I set the condition.


[PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isinf for SFDF and IEEE128 by test
data class instructions.

  Compared with previous version, the main change is to modify
the dg-options and dg-finals of test cases according to reviewer's
advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isinf for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isinf2 for SFDF): New expand.
(isinf2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-1.c: New test.
* gcc.target/powerpc/pr97786-2.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..fa20fb4df91 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
   operands[4] = CONST0_RTX (SImode);
 })

+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
new file mode 100644
index 000..c1c4f64ee8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isinf (x);
+}
+
+int test3 (float x)
+{
+  return __builtin_isinff (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
new file mode 100644
index 000..21d90868268
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


[PATCH-2v2, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isfinite for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is not to test
if pseudo can be created in expand and modify dg-options and
dg-finals of test cases according to reviewer's advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isfinite for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isfinite2 for SFDF): New expand.
(isfinite2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-4.c: New test.
* gcc.target/powerpc/pr97786-5.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f0cc02f7e7b..cbb538d6d86 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5333,6 +5333,28 @@ (define_expand "isinf2"
   DONE;
 })

+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
new file mode 100644
index 000..01faa962bd5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isfinite (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
new file mode 100644
index 000..5fc98084274
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH-3v2, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isnormal for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is not to test
if pseudo can be created in expand and modify dg-options and
dg-finals of test cases according to reviewer's advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649368.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isnormal for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isnormal2 for SFDF): New expand.
(isnormal2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-7.c: New test.
* gcc.target/powerpc/pr97786-8.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index ab17178e0a8..cae30dc431e 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5353,6 +5353,28 @@ (define_expand "isfinite2"
   DONE;
 })

+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
new file mode 100644
index 000..2df472e35d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isnormal (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
new file mode 100644
index 000..0416970b89b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-19 Thread HAO CHEN GUI
Hi Andrew,

在 2024/5/19 3:42, Andrew Pinski 写道:
> This is missing adding documentation for the new optab.
> It should be documented in md.texi under `Standard Pattern Names For
> Generation` section.

Thanks for your reminder. I will add ones for all patches.

Thanks
Gui Haochen


[PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to document isfinite
in md.texi.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.
* doc/md.texi (isfinite): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..b8432f84020 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab; break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..8ed70b3feea 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

 This pattern is not allowed to @code{FAIL}.

+@cindex @code{isfinite@var{m}2} instruction pattern
+@item @samp{isfinite@var{m}2}
+Set operand 0 to nonzero if operand 1 is a finite floating-point
+number and to 0 otherwise.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCHv2] Optab: add isnormal_optab for __builtin_isnormal

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to document isnormal
in md.texi.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.
* doc/md.texi (isnormal): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b8432f84020..ccd57fce522 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 case BUILT_IN_ISFINITE:
   builtin_optab = isfinite_optab; break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab; break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8ed70b3feea..b81b9dec18a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8562,6 +8562,11 @@ This pattern is not allowed to @code{FAIL}.
 Set operand 0 to nonzero if operand 1 is a finite floating-point
 number and to 0 otherwise.

+@cindex @code{isnormal@var{m}2} instruction pattern
+@item @samp{isnormal@var{m}2}
+Set operand 0 to nonzero if operand 1 is a normal floating-point
+number and to 0 otherwise.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCH-1v2] Value Range: Add range op for builtin isinf

2024-05-20 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf.

  Compared with previous version, the main change is to set varying if
nothing is known about the range.
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 55dfbb23ce2..eb1b0aff77c 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1175,6 +1175,62 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (&op1.lower_bound ())
+   && !real_isinf (&op1.upper_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1268,6 +1324,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+


[PATCH-2v3] Value Range: Add range op for builtin isfinite

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isfinite.

  Compared to previous version, the main change is to set varying if
nothing is known about the range.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isfinite

The former patch adds optab for builtin isfinite. Thus builtin isfinite
might not be folded at front end.  So the range op for isfinite is needed
for value range analysis.  This patch adds range op for builtin isfinite.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 922ee7bf0f7..49b6d7abde1 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1229,6 +1229,61 @@ public:
   }
 } op_cfn_isinf;

+//Implement range operator for CFN_BUILT_IN_ISFINITE
+class cfn_isfinite : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isfinite ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
+   // Set range to varying
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isfinite;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1326,6 +1381,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_isinf;
   break;

+case CFN_BUILT_IN_ISFINITE:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isfinite;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
new file mode 100644
index 000..f5dce0a0486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


[PATCH-3] Value Range: Add range op for builtin isnormal

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isnormal. It also adds two
help function in frange to detect range of normal floating-point and
range of subnormal or zero.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isnormal

The former patch adds optab for builtin isnormal. Thus builtin isnormal
might not be folded at front end.  So the range op for isnormal is needed
for value range analysis.  This patch adds range op for builtin isnormal.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.
* value-range.h (class frange): Declare known_isnormal and
known_isdenormal_or_zero.
(frange::known_isnormal): Define.
(frange::known_isdenormal_or_zero): Define.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index d69900d1f56..4c3f9c98282 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1281,6 +1281,60 @@ public:
   }
 } op_cfn_isfinite;

+//Implement range operator for CFN_BUILT_IN_ISNORMAL
+class cfn_isnormal :  public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isnormal ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ()
+   || op1.known_isdenormal_or_zero ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isnormal;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1383,6 +1437,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_isfinite;
   break;

+case CFN_BUILT_IN_ISNORMAL:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isnormal;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
new file mode 100644
index 000..c4df4d839b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 37ce91dc52d..1443d1906e5 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -588,6 +588,8 @@ public:
   bool maybe_isinf () const;
   bool signbit_p (bool &signbit) const;
   bool nan_signbit_p (bool &signbit) const;
+  bool known_isnormal () const;
+  bool known_isdenormal_or_zero () const;

 protected:
   virtual bool contains_p (tree cst) const override;
@@ -1650,6 +1652,33 @@ frange::known_isfinite () const
   return (!maybe_isnan () && !real_isinf (&m_min) && !real_isinf (&m_max));
 }

+// Return TRUE if range is known to be normal.
+
+inline bool
+frange::known_isnormal () const
+{
+  if (!known_isfinite ())
+return false;
+
+  machine_mode mode = TYPE_MODE (type ());
+  return (!real_isdenormal (&m_min, mode) && !real_isdenormal (&m_max, mode)
+ && !real_iszero (&m_min) && !real_iszero (&m_max)
+ && (!real_isneg (&m_min) ||

Re: [PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-23 Thread HAO CHEN GUI
Hi Peter,
  Thanks for your comments.

在 2024/5/23 5:58, Peter Bergner 写道:
> Is there a reason not to use the vsx_register_operand predicate for op1
> which matches the predicate for the operand of the xststdcp pattern
> we're passing op1 to?

No, I will fix them.

Thanks
Gui Haochen


  1   2   3   4   5   >