Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
Richard, 在 2023/9/28 21:39, Richard Sandiford 写道: > That looks easily solvable though. I've posted a potential fix as: > >https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631595.html > > Is that the only blocker to doing this in generic code? Thanks so much for your patch. It works. I don't find other blocks. I will do a regression test after I am back from Holiday. Thanks Gui Haochen
[PATCH-1, expand] Enable vector mode for compare_by_pieces [PR111449]
Hi, Vector mode instructions are efficient on some targets (e.g. ppc64). This patch enables vector mode for compare_by_pieces. The non-member function widest_fixed_size_mode_for_size takes by_pieces_operation as the second argument and decide whether vector mode is enabled or not by the type of operations. Currently only set and compare enabled vector mode and do the optab checking correspondingly. The test case is in the second patch which is rs6000 specific. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog Expand: Enable vector mode for pieces compare Vector mode compare instructions are efficient for equality compare on rs6000. This patch refactors the codes of pieces operation to enable vector mode for compare. gcc/ PR target/111449 * expr.cc (widest_fixed_size_mode_for_size): Enable vector mode for compare. Replace the second argument with the type of pieces operation. Add optab checks for vector mode used in compare. (by_pieces_ninsns): Pass the type of pieces operation to widest_fixed_size_mode_for_size. (class op_by_pieces_d): Add virtual function widest_fixed_size_mode_for_size. (op_by_pieces_d::op_by_pieces_d): Call outer function widest_fixed_size_mode_for_size. (op_by_pieces_d::get_usable_mode): Call class function widest_fixed_size_mode_for_size. (op_by_pieces_d::run): Likewise. (class move_by_pieces_d): Declare function widest_fixed_size_mode_for_size. (move_by_pieces_d::widest_fixed_size_mode_for_size): Implement. (class store_by_pieces_d): Declare function widest_fixed_size_mode_for_size. (store_by_pieces_d::widest_fixed_size_mode_for_size): Implement. (can_store_by_pieces): Pass the type of pieces operation to widest_fixed_size_mode_for_size. (class compare_by_pieces_d): Declare function widest_fixed_size_mode_for_size. (compare_by_pieces_d::compare_by_pieces_d): Set m_qi_vector_mode to true. (compare_by_pieces_d::widest_fixed_size_mode_for_size): Implement. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index d87346dc07f..9885404ee9c 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -992,8 +992,9 @@ alignment_for_piecewise_move (unsigned int max_pieces, unsigned int align) that is narrower than SIZE bytes. */ static fixed_size_mode -widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) +widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { + bool qi_vector = ((op == COMPARE_BY_PIECES) || op == SET_BY_PIECES); fixed_size_mode result = NARROWEST_INT_MODE; gcc_checking_assert (size > 1); @@ -1009,8 +1010,13 @@ widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) { if (GET_MODE_SIZE (candidate) >= size) break; - if (optab_handler (vec_duplicate_optab, candidate) - != CODE_FOR_nothing) + if ((op == SET_BY_PIECES +&& optab_handler (vec_duplicate_optab, candidate) + != CODE_FOR_nothing) +|| (op == COMPARE_BY_PIECES +&& optab_handler (mov_optab, mode) + != CODE_FOR_nothing +&& can_compare_p (EQ, mode, ccp_jump))) result = candidate; } @@ -1061,8 +1067,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, { /* NB: Round up L and ALIGN to the widest integer mode for MAX_SIZE. */ - mode = widest_fixed_size_mode_for_size (max_size, - op == SET_BY_PIECES); + mode = widest_fixed_size_mode_for_size (max_size, op); if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) { unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); @@ -1076,8 +1081,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, while (max_size > 1 && l > 0) { - mode = widest_fixed_size_mode_for_size (max_size, - op == SET_BY_PIECES); + mode = widest_fixed_size_mode_for_size (max_size, op); enum insn_code icode; unsigned int modesize = GET_MODE_SIZE (mode); @@ -1327,6 +1331,8 @@ class op_by_pieces_d virtual void finish_mode (machine_mode) { } + virtual fixed_size_mode widest_fixed_size_mode_for_size (unsigned int size) += 0; public: op_by_pieces_d (unsigned int, rtx, bool, rtx, bool, by_pieces_constfn, @@ -1375,8 +1381,7 @@ op_by_pieces_d::op_by_pieces_d (unsigned int max_pieces, rtx to, { /* Find the mode of the largest comparison. */ fixed_size_mode mode - = widest_fixed_size_mode_for_size (m_max_size, - m_qi_vector_mode); + = ::widest_fixed_size_mode_for_size (m_max_size, COM
[PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
Hi, This patch enables vector mode for memory equality compare by adding a new expand cbranchv16qi4 and implementing it. Also the corresponding CC reg and compare code is set in rs6000_generate_compare. With the patch, 16-byte equality compare can be implemented by one vector compare instructions other than 2 8-byte compares with branches. The test case is in the second patch which is rs6000 specific. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Enable vector compare for memory equality compare gcc/ PR target/111449 * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern. * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn sequence for V16QImode equality compare. * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. (COMPARE_MAX_PIECES): Define. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449.c: New. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index e8a596fb7e9..c69bf266402 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2605,6 +2605,39 @@ (define_insn "altivec_vupklpx" } [(set_attr "type" "vecperm")]) +(define_expand "cbranchv16qi4" + [(use (match_operator 0 "equality_operator" + [(match_operand:V16QI 1 "gpc_reg_operand") +(match_operand:V16QI 2 "gpc_reg_operand")])) + (use (match_operand 3))] + "VECTOR_UNIT_ALTIVEC_P (V16QImode)" +{ + if (!TARGET_P9_VECTOR + && MEM_P (operands[1]) + && !altivec_indexed_or_indirect_operand (operands[1], V16QImode) + && MEM_P (operands[2]) + && !altivec_indexed_or_indirect_operand (operands[2], V16QImode)) +{ + /* Use direct move as the byte order doesn't matter for equality +compare. */ + rtx reg_op1 = gen_reg_rtx (V16QImode); + rtx reg_op2 = gen_reg_rtx (V16QImode); + rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode); + rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode); + operands[1] = reg_op1; + operands[2] = reg_op2; +} + else +{ + operands[1] = force_reg (V16QImode, operands[1]); + operands[2] = force_reg (V16QImode, operands[2]); +} + rtx_code code = GET_CODE (operands[0]); + operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]); + rs6000_emit_cbranch (V16QImode, operands); + DONE; +}) + ;; Compare vectors producing a vector result and a predicate, setting CR6 to ;; indicate a combined status (define_insn "altivec_vcmpequ_p" diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index efe9adce1f8..0087d786840 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode) else emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b)); } + else if (mode == V16QImode) + { + gcc_assert (code == EQ || code == NE); + + rtx result_vector = gen_reg_rtx (V16QImode); + compare_result = gen_rtx_REG (CCmode, CR6_REGNO); + emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1)); + code = (code == NE) ? GE : LT; + } else emit_insn (gen_rtx_SET (compare_result, gen_rtx_COMPARE (comp_mode, op0, op1))); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..dc33bca0802 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1730,6 +1730,8 @@ typedef struct rs6000_args in one reasonably fast instruction. */ #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) #define MAX_MOVE_MAX 8 +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) /* Nonzero if access to memory by bytes is no faster than for words. Also nonzero if doing byte operations (specifically shifts) in registers diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c new file mode 100644 index 000..a8c30b92a41 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-maltivec -O2" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +/* Ensure vector comparison is used for 16-byte memory equality compare. */ + +int compare1 (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 16) == 0; +} + +int compare2 (const char* s1) +{ + return __builtin_memcmp (s1, "0123456789012345", 16) == 0; +} + +/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */ +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */
Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
Hi David, Thanks for your review comments. 在 2023/10/9 23:42, David Edelsohn 写道: > #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) > #define MAX_MOVE_MAX 8 > +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) > +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) > > > How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES determined? > The email does not provide any explanation for the implementation. The rest > of the patch is related to vector support, but vector support is not > dependent on TARGET_POWERPC64. By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value as MOVE_MAX. The move and compare instructions are required in compare_by_pieces, those macros are set to 16 byte when supporting vector mode (V16QImode). The problem is rs6000 hasn't supported TImode for "-m32". We discussed it in issue 1307. TImode will be used for move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported with "-m32" which might cause ICE. So MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set to 4 for 32 bit target in this patch. They could be changed to 16 after rs6000 supports TImode with "-m32". Thanks Gui Haochen
[PATCH-1v2, expand] Enable vector mode for compare_by_pieces [PR111449]
Hi, Vector mode instructions are efficient on some targets (e.g. ppc64). This patch enables vector mode for compare_by_pieces. The non-member function widest_fixed_size_mode_for_size takes by_pieces_operation as the second argument and decide whether vector mode is enabled or not by the type of operations. Currently only set and compare enabled vector mode and do the optab checking correspondingly. The test case is in the second patch which is rs6000 specific. Compared to last version, the main change is to enable vector mode for compare_by_pieces in smallest_fixed_size_mode_for_size which is used for overlapping compare. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog Expand: Enable vector mode for pieces compares Vector mode compare instructions are efficient for equality compare on rs6000. This patch refactors the codes of pieces operation to enable vector mode for compare. gcc/ PR target/111449 * expr.cc (widest_fixed_size_mode_for_size): Enable vector mode for compare. Replace the second argument with the type of pieces operation. Add optab checks for vector mode used in compare. (by_pieces_ninsns): Pass the type of pieces operation to widest_fixed_size_mode_for_size. (class op_by_pieces_d): Define virtual function widest_fixed_size_mode_for_size and optab_checking. (op_by_pieces_d::op_by_pieces_d): Call outer function widest_fixed_size_mode_for_size. (op_by_pieces_d::get_usable_mode): Call class function widest_fixed_size_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call optab_checking for different types of operations. (op_by_pieces_d::run): Call class function widest_fixed_size_mode_for_size. (class move_by_pieces_d): Declare function widest_fixed_size_mode_for_size. (move_by_pieces_d::widest_fixed_size_mode_for_size): Implement. (class store_by_pieces_d): Declare function widest_fixed_size_mode_for_size and optab_checking. (store_by_pieces_d::optab_checking): Implement. (store_by_pieces_d::widest_fixed_size_mode_for_size): Implement. (can_store_by_pieces): Pass the type of pieces operation to widest_fixed_size_mode_for_size. (class compare_by_pieces_d): Declare function widest_fixed_size_mode_for_size and optab_checking. (compare_by_pieces_d::compare_by_pieces_d): Set m_qi_vector_mode to true to enable vector mode. (compare_by_pieces_d::widest_fixed_size_mode_for_size): Implement. (compare_by_pieces_d::optab_checking): Implement. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 9a37bff1fdd..e83c0a378ed 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -992,8 +992,9 @@ alignment_for_piecewise_move (unsigned int max_pieces, unsigned int align) that is narrower than SIZE bytes. */ static fixed_size_mode -widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) +widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { + bool qi_vector = ((op == COMPARE_BY_PIECES) || op == SET_BY_PIECES); fixed_size_mode result = NARROWEST_INT_MODE; gcc_checking_assert (size > 1); @@ -1009,8 +1010,13 @@ widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) { if (GET_MODE_SIZE (candidate) >= size) break; - if (optab_handler (vec_duplicate_optab, candidate) - != CODE_FOR_nothing) + if ((op == SET_BY_PIECES +&& optab_handler (vec_duplicate_optab, candidate) + != CODE_FOR_nothing) +|| (op == COMPARE_BY_PIECES +&& optab_handler (mov_optab, mode) + != CODE_FOR_nothing +&& can_compare_p (EQ, mode, ccp_jump))) result = candidate; } @@ -1061,8 +1067,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, { /* NB: Round up L and ALIGN to the widest integer mode for MAX_SIZE. */ - mode = widest_fixed_size_mode_for_size (max_size, - op == SET_BY_PIECES); + mode = widest_fixed_size_mode_for_size (max_size, op); if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) { unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); @@ -1076,8 +1081,7 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, while (max_size > 1 && l > 0) { - mode = widest_fixed_size_mode_for_size (max_size, - op == SET_BY_PIECES); + mode = widest_fixed_size_mode_for_size (max_size, op); enum insn_code icode; unsigned int modesize = GET_MODE_SIZE (mode); @@ -1327,6 +1331,12 @@ class op_by_pieces_d virtual void finish_mode (machine_mode
[PATCH-2v2, rs6000] Enable vector mode for memory equality compare [PR111449]
Hi, This patch enables vector mode for memory equality compare by adding a new expand cbranchv16qi4 and implementing it. Also the corresponding CC reg and compare code is set in rs6000_generate_compare. With the patch, 16-byte equality compare can be implemented by one vector compare instructions other than two 8-byte compares with branches. The vector mode compare is only enabled on powerpc64 as TImode hasn't be supported on 32 bit platform. By setting MOVE_MAX_PIECES to 16, TImode compare might be generated. Compared to last version, the main change is to add guard "TARGET_VSX" to the expand as it's required by unaligned vector load. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Enable vector compare for memory equality compare gcc/ PR target/111449 * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern. * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn sequence for V16QImode equality compare. * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. (COMPARE_MAX_PIECES): Define. (STORE_MAX_PIECES): Define. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449.c: New. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index e8a596fb7e9..e4492ff9569 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2605,6 +2605,42 @@ (define_insn "altivec_vupklpx" } [(set_attr "type" "vecperm")]) +(define_expand "cbranchv16qi4" + [(use (match_operator 0 "equality_operator" + [(match_operand:V16QI 1 "reg_or_mem_operand") +(match_operand:V16QI 2 "reg_or_mem_operand")])) + (use (match_operand 3))] + "VECTOR_UNIT_ALTIVEC_P (V16QImode) + && TARGET_VSX" +{ + if (!TARGET_P9_VECTOR + && !BYTES_BIG_ENDIAN + && MEM_P (operands[1]) + && !altivec_indexed_or_indirect_operand (operands[1], V16QImode) + && MEM_P (operands[2]) + && !altivec_indexed_or_indirect_operand (operands[2], V16QImode)) +{ + /* Use direct move for P8 little endian to skip bswap, as the byte +order doesn't matter for equality compare. */ + rtx reg_op1 = gen_reg_rtx (V16QImode); + rtx reg_op2 = gen_reg_rtx (V16QImode); + rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode); + rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode); + operands[1] = reg_op1; + operands[2] = reg_op2; +} + else +{ + operands[1] = force_reg (V16QImode, operands[1]); + operands[2] = force_reg (V16QImode, operands[2]); +} + + rtx_code code = GET_CODE (operands[0]); + operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]); + rs6000_emit_cbranch (V16QImode, operands); + DONE; +}) + ;; Compare vectors producing a vector result and a predicate, setting CR6 to ;; indicate a combined status (define_insn "altivec_vcmpequ_p" diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index efe9adce1f8..0087d786840 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode) else emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b)); } + else if (mode == V16QImode) + { + gcc_assert (code == EQ || code == NE); + + rtx result_vector = gen_reg_rtx (V16QImode); + compare_result = gen_rtx_REG (CCmode, CR6_REGNO); + emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1)); + code = (code == NE) ? GE : LT; + } else emit_insn (gen_rtx_SET (compare_result, gen_rtx_COMPARE (comp_mode, op0, op1))); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..dd8565e3971 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1730,6 +1730,9 @@ typedef struct rs6000_args in one reasonably fast instruction. */ #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) #define MAX_MOVE_MAX 8 +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) +#define STORE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 8) /* Nonzero if access to memory by bytes is no faster than for words. Also nonzero if doing byte operations (specifically shifts) in registers diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c new file mode 100644 index 000..a8c30b92a41 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-maltivec -O2" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +/* Ensure vector comparison is used for 16-byte memory equality compare. */ + +int compare1 (const char* s1, const char* s2
Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
Hi David, 在 2023/10/10 20:44, David Edelsohn 写道: > Are you stating that although PPC32 supports V16QImode in VSX, the > move_by_pieces support also requires TImode, which is not available on PPC32? > Yes. By setting MOVE_MAX_PIECES to 16, TImode compare might be generated as it checks vector mode first then uses scalar mode by default. Thanks Gui Haochen
PATCH-1v3, expand] Enable vector mode for compare_by_pieces [PR111449]
Hi, Vector mode instructions are efficient for compare on some targets. This patch enables vector mode for compare_by_pieces. Currently, vector mode is enabled for compare, set and clear. Helper function "qi_vector_p" decides if vector mode is enabled for certain by pieces operation. optabs_checking checks if optabs are available for the mode and certain by pieces operations. Both of them are called in fixed_size_mode finding functions. A member is added to class op_by_pieces_d in order to record the type of by pieces operations. The test case is in the second patch which is rs6000 specific. Compared to last version, the main change is to create two helper functions and call them in mode finding function. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog Expand: Enable vector mode for pieces compares Vector mode compare instructions are efficient for equality compare on rs6000. This patch refactors the codes of pieces operation to enable vector mode for compare. gcc/ PR target/111449 * expr.cc (qi_vector_p): New function to indicate if vector mode is enabled for certain by pieces operations. (optabs_checking): New function to check if optabs are available for certain by pieces operations. (widest_fixed_size_mode_for_size): Replace the second argument with the type of by pieces operations. Call qi_vector_p to check if vector mode is enable. Call optabs_checking to check if optabs are available for the candidate vector mode. (by_pieces_ninsns): Pass the type of by pieces operation to widest_fixed_size_mode_for_size. (class op_by_pieces_d): Add a protected member m_op to record the type of by pieces operations. Declare member function fixed_size_mode widest_fixed_size_mode_for_size. (op_by_pieces_d::op_by_pieces_d): Change last argument to the type of by pieces operations, initialize m_op with it. Call non-member function widest_fixed_size_mode_for_size. (op_by_pieces_d::get_usable_mode): Call member function widest_fixed_size_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call qi_vector_p to check if vector mode is enable. Call optabs_checking to check if optabs are available for the candidate vector mode. (op_by_pieces_d::run): Call member function widest_fixed_size_mode_for_size. (op_by_pieces_d::widest_fixed_size_mode_for_size): Implement. (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES. (store_by_pieces_d::store_by_pieces_d): Set m_op with the op. (can_store_by_pieces): Pass the type of by pieces operations to widest_fixed_size_mode_for_size. (clear_by_pieces): Initialize class store_by_pieces_d with CLEAR_BY_PIECES. (compare_by_pieces_d::compare_by_pieces_d): Set m_op to COMPARE_BY_PIECES. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index d87346dc07f..8ec3f5465a9 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -988,18 +988,43 @@ alignment_for_piecewise_move (unsigned int max_pieces, unsigned int align) return align; } -/* Return the widest QI vector, if QI_MODE is true, or integer mode - that is narrower than SIZE bytes. */ +/* Return true if vector mode is enabled for the op. */ +static bool +qi_vector_p (by_pieces_operation op) +{ + return (op == COMPARE_BY_PIECES + || op == SET_BY_PIECES + || op == CLEAR_BY_PIECES); +} + +/* Return true if optabs are available for the mode and by pieces + operations. */ +static bool +optabs_checking (fixed_size_mode mode, by_pieces_operation op) +{ + if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) + && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing) +return true; + else if (op == COMPARE_BY_PIECES + && optab_handler (mov_optab, mode) != CODE_FOR_nothing + && can_compare_p (EQ, mode, ccp_jump)) +return true; + + return false; +} + +/* Return the widest QI vector, if vector mode is enabled for the op, + or integer mode that is narrower than SIZE bytes. */ static fixed_size_mode -widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) +widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { fixed_size_mode result = NARROWEST_INT_MODE; gcc_checking_assert (size > 1); /* Use QI vector only if size is wider than a WORD. */ - if (qi_vector && size > UNITS_PER_WORD) + if (qi_vector_p (op) && size > UNITS_PER_WORD) { machine_mode mode; fixed_size_mode candidate; @@ -1009,8 +1034,7 @@ widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) { if (GET_MODE_SIZE (candidate) >= size) break; - if (optab_handler (vec_duplicate_optab, candidate) - != CODE_FOR_nothing) +
Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
Kewen & David, Thanks for your comments. 在 2023/10/17 10:19, Kewen.Lin 写道: > I think David raised a good question, it sounds to me that the current > handling simply consider that if MOVE_MAX_PIECES is set to 16, the > required operations for this optimization on TImode are always available, > but unfortunately on rs6000 the assumption doesn't hold, so could we > teach generic code instead? Finally I found that it doesn't check if the scalar mode used in by pieces operations is enabled by the target. The TImode is not enabled on ppc. It should be checked before taking TImode to do by pieces operations. I made a patch for the generic code and testing it. With the patch, 16-byte comparison could be enabled on both ppc64 and ppc. Thanks Gui Haochen
[PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]
Hi, Vector mode instructions are efficient for compare on some targets. This patch enables vector mode for compare_by_pieces. Two help functions are added to check if vector mode is available for certain by pieces operations and if if optabs exists for the mode and certain by pieces operations. One member is added in class op_by_pieces_d to record the type of operations. The test case is in the second patch which is rs6000 specific. Compared to last version, the main change is to add a target hook check - scalar_mode_supported_p when retrieving the available scalar modes. The mode which is not supported for a target should be skipped. (e.g. TImode on ppc). Also some function names and comments are refined according to reviewer's advice. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog Expand: Enable vector mode for by pieces compares Vector mode compare instructions are efficient for equality compare on rs6000. This patch refactors the codes of by pieces operation to enable vector mode for compare. gcc/ PR target/111449 * expr.cc (can_use_qi_vectors): New function to return true if we know how to implement OP using vectors of bytes. (qi_vector_mode_supported_p): New function to check if optabs exists for the mode and certain by pieces operations. (widest_fixed_size_mode_for_size): Replace the second argument with the type of by pieces operations. Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. Call scalar_mode_supported_p to check if the scalar mode is supported. (by_pieces_ninsns): Pass the type of by pieces operation to widest_fixed_size_mode_for_size. (class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to record the type of by pieces operations. (op_by_pieces_d::op_by_pieces_d): Change last argument to the type of by pieces operations, initialize m_op with it. Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::get_usable_mode): Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. (op_by_pieces_d::run): Pass m_op to function widest_fixed_size_mode_for_size. (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES. (store_by_pieces_d::store_by_pieces_d): Set m_op with the op. (can_store_by_pieces): Pass the type of by pieces operations to widest_fixed_size_mode_for_size. (clear_by_pieces): Initialize class store_by_pieces_d with CLEAR_BY_PIECES. (compare_by_pieces_d::compare_by_pieces_d): Set m_op to COMPARE_BY_PIECES. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 2c9930ec674..ad5f9dd8ec2 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int max_pieces, unsigned int align) return align; } -/* Return the widest QI vector, if QI_MODE is true, or integer mode - that is narrower than SIZE bytes. */ +/* Return true if we know how to implement OP using vectors of bytes. */ +static bool +can_use_qi_vectors (by_pieces_operation op) +{ + return (op == COMPARE_BY_PIECES + || op == SET_BY_PIECES + || op == CLEAR_BY_PIECES); +} + +/* Return true if optabs exists for the mode and certain by pieces + operations. */ +static bool +qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) +{ + if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) + && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing) +return true; + + if (op == COMPARE_BY_PIECES + && optab_handler (mov_optab, mode) != CODE_FOR_nothing + && can_compare_p (EQ, mode, ccp_jump)) +return true; + return false; +} + +/* Return the widest mode that can be used to perform part of an + operation OP on SIZE bytes. Try to use QI vector modes where + possible. */ static fixed_size_mode -widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) +widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { fixed_size_mode result = NARROWEST_INT_MODE; gcc_checking_assert (size > 1); /* Use QI vector only if size is wider than a WORD. */ - if (qi_vector && size > UNITS_PER_WORD) + if (can_use_qi_vectors (op) && size > UNITS_PER_WORD) { machine_mode mode; fixed_size_mode candidate; @@ -1009,8 +1035,7 @@ widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) { if (GET_MODE_SIZE (candidate) >= size) break; - if (optab_handler (vec_duplicate_optab, candidate) - != CODE_FOR_nothing) + if (qi_vector_mode_supported_p (candidate, op)) result = candidate; }
Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]
Committed as r14-4835. https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085 Thanks Gui Haochen 在 2023/10/20 16:49, Richard Sandiford 写道: > HAO CHEN GUI writes: >> Hi, >> Vector mode instructions are efficient for compare on some targets. >> This patch enables vector mode for compare_by_pieces. Two help >> functions are added to check if vector mode is available for certain >> by pieces operations and if if optabs exists for the mode and certain >> by pieces operations. One member is added in class op_by_pieces_d to >> record the type of operations. >> >> The test case is in the second patch which is rs6000 specific. >> >> Compared to last version, the main change is to add a target hook >> check - scalar_mode_supported_p when retrieving the available scalar >> modes. The mode which is not supported for a target should be skipped. >> (e.g. TImode on ppc). Also some function names and comments are refined >> according to reviewer's advice. >> >> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >> regressions. >> >> Thanks >> Gui Haochen >> >> ChangeLog >> Expand: Enable vector mode for by pieces compares >> >> Vector mode compare instructions are efficient for equality compare on >> rs6000. This patch refactors the codes of by pieces operation to enable >> vector mode for compare. >> >> gcc/ >> PR target/111449 >> * expr.cc (can_use_qi_vectors): New function to return true if >> we know how to implement OP using vectors of bytes. >> (qi_vector_mode_supported_p): New function to check if optabs >> exists for the mode and certain by pieces operations. >> (widest_fixed_size_mode_for_size): Replace the second argument >> with the type of by pieces operations. Call can_use_qi_vectors >> and qi_vector_mode_supported_p to do the check. Call >> scalar_mode_supported_p to check if the scalar mode is supported. >> (by_pieces_ninsns): Pass the type of by pieces operation to >> widest_fixed_size_mode_for_size. >> (class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to >> record the type of by pieces operations. >> (op_by_pieces_d::op_by_pieces_d): Change last argument to the >> type of by pieces operations, initialize m_op with it. Pass >> m_op to function widest_fixed_size_mode_for_size. >> (op_by_pieces_d::get_usable_mode): Pass m_op to function >> widest_fixed_size_mode_for_size. >> (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call >> can_use_qi_vectors and qi_vector_mode_supported_p to do the >> check. >> (op_by_pieces_d::run): Pass m_op to function >> widest_fixed_size_mode_for_size. >> (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES. >> (store_by_pieces_d::store_by_pieces_d): Set m_op with the op. >> (can_store_by_pieces): Pass the type of by pieces operations to >> widest_fixed_size_mode_for_size. >> (clear_by_pieces): Initialize class store_by_pieces_d with >> CLEAR_BY_PIECES. >> (compare_by_pieces_d::compare_by_pieces_d): Set m_op to >> COMPARE_BY_PIECES. > > OK, thanks. And thanks for your patience. > > Richard > >> patch.diff >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index 2c9930ec674..ad5f9dd8ec2 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int max_pieces, >> unsigned int align) >>return align; >> } >> >> -/* Return the widest QI vector, if QI_MODE is true, or integer mode >> - that is narrower than SIZE bytes. */ >> +/* Return true if we know how to implement OP using vectors of bytes. */ >> +static bool >> +can_use_qi_vectors (by_pieces_operation op) >> +{ >> + return (op == COMPARE_BY_PIECES >> + || op == SET_BY_PIECES >> + || op == CLEAR_BY_PIECES); >> +} >> + >> +/* Return true if optabs exists for the mode and certain by pieces >> + operations. */ >> +static bool >> +qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) >> +{ >> + if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) >> + && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing) >> +return true; >> + >> + if (op == COMPARE_BY_PIECES >> + && optab_handler (mov_optab, mode) != CODE_FOR_nothing >> + && can_compare_p (EQ, mode, ccp_jump)) >> +return true; >&
Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]
OK, I will take it. Thanks Gui Haochen 在 2023/10/24 16:49, Jiang, Haochen 写道: > It seems that the mail got caught elsewhere and did not send into gcc-patches > mailing thread. Resending that. > > Thx, > Haochen > > -Original Message- > From: Jiang, Haochen > Sent: Tuesday, October 24, 2023 4:43 PM > To: HAO CHEN GUI ; Richard Sandiford > > Cc: gcc-patches > Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces > [PR111449] > > Hi Haochen Gui, > > It seems that the commit caused lots of test case fail on x86 platforms: > > https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html > https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html > https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html > https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html > https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html > https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html > > Please help verify that if we need some testcase change or we get bug here. > > A simple reproducer under build folder is: > > make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C > --target_board='unix{-m64\ -march=cascadelake,-m32\ > -march=cascadelake,-m32,-m64}'" > > Thx, > Haochen > >> -Original Message- >> From: HAO CHEN GUI >> Sent: Monday, October 23, 2023 9:30 AM >> To: Richard Sandiford >> Cc: gcc-patches >> Subject: Re: [PATCH-1v4, expand] Enable vector mode for >> compare_by_pieces [PR111449] >> >> Committed as r14-4835. >> >> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085 >> >> Thanks >> Gui Haochen >> >> 在 2023/10/20 16:49, Richard Sandiford 写道: >>> HAO CHEN GUI writes: >>>> Hi, >>>> Vector mode instructions are efficient for compare on some targets. >>>> This patch enables vector mode for compare_by_pieces. Two help >>>> functions are added to check if vector mode is available for >>>> certain by pieces operations and if if optabs exists for the mode >>>> and certain by pieces operations. One member is added in class >>>> op_by_pieces_d to record the type of operations. >>>> >>>> The test case is in the second patch which is rs6000 specific. >>>> >>>> Compared to last version, the main change is to add a target hook >>>> check - scalar_mode_supported_p when retrieving the available >>>> scalar modes. The mode which is not supported for a target should be >>>> skipped. >>>> (e.g. TImode on ppc). Also some function names and comments are >>>> refined according to reviewer's advice. >>>> >>>> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with >>>> no regressions. >>>> >>>> Thanks >>>> Gui Haochen >>>> >>>> ChangeLog >>>> Expand: Enable vector mode for by pieces compares >>>> >>>> Vector mode compare instructions are efficient for equality compare >>>> on rs6000. This patch refactors the codes of by pieces operation to >>>> enable vector mode for compare. >>>> >>>> gcc/ >>>>PR target/111449 >>>>* expr.cc (can_use_qi_vectors): New function to return true if >>>>we know how to implement OP using vectors of bytes. >>>>(qi_vector_mode_supported_p): New function to check if optabs >>>>exists for the mode and certain by pieces operations. >>>>(widest_fixed_size_mode_for_size): Replace the second argument >>>>with the type of by pieces operations. Call can_use_qi_vectors >>>>and qi_vector_mode_supported_p to do the check. Call >>>>scalar_mode_supported_p to check if the scalar mode is supported. >>>>(by_pieces_ninsns): Pass the type of by pieces operation to >>>>widest_fixed_size_mode_for_size. >>>>(class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to >>>>record the type of by pieces operations. >>>>(op_by_pieces_d::op_by_pieces_d): Change last argument to the >>>>type of by pieces operations, initialize m_op with it. Pass >>>>m_op to function widest_fixed_size_mode_for_size. >>>>(op_by_pieces_d::get_usable_mode): Pass m_op to function >>>>widest_fixed_size_mode_for_size. >>>>(op_by_pieces_d::smallest_fixed_size_mode_for_siz
Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]
Hi Haochen, The regression cases are caused by "targetm.scalar_mode_supported_p" I added for scalar mode checking. XImode, OImode and TImode (with -m32) are not enabled in ix86_scalar_mode_supported_p. So they're excluded from by pieces operations on i386. The original code doesn't do a check for scalar modes. I think it might be incorrect as not all scalar modes support move and compare optabs. (e.g. TImode with -m32 on rs6000). I drafted a new patch to manually check optabs for scalar mode. Now both vector and scalar modes are checked for optabs. I did a simple test. All former regression cases are back. Could you help do a full regression test? I am worry about the coverage of my CI system. Thanks Gui Haochen patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 7aac575eff8..2af9fcbed18 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op) /* Return true if optabs exists for the mode and certain by pieces operations. */ static bool -qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) +mode_supported_p (fixed_size_mode mode, by_pieces_operation op) { + if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) +return false; + if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) - && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing) -return true; + && VECTOR_MODE_P (mode) + && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) +return false; if (op == COMPARE_BY_PIECES - && optab_handler (mov_optab, mode) != CODE_FOR_nothing - && can_compare_p (EQ, mode, ccp_jump)) -return true; + && !can_compare_p (EQ, mode, ccp_jump)) +return false; - return false; + return true; } /* Return the widest mode that can be used to perform part of an @@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { if (GET_MODE_SIZE (candidate) >= size) break; - if (qi_vector_mode_supported_p (candidate, op)) + if (mode_supported_p (candidate, op)) result = candidate; } @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { mode = tmode.require (); if (GET_MODE_SIZE (mode) < size - && targetm.scalar_mode_supported_p (mode)) + && mode_supported_p (mode, op)) result = mode; } @@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size) break; if (GET_MODE_SIZE (candidate) >= size - && qi_vector_mode_supported_p (candidate, m_op)) + && mode_supported_p (candidate, m_op)) return candidate; } }
[PATCH, expand] Checking available optabs for scalar modes in by pieces operations
Hi, This patch checks available optabs for scalar modes used in by pieces operations. It fixes the regression cases caused by previous patch. Now both scalar and vector modes are examined by the same approach. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog Expand: Checking available optabs for scalar modes in by pieces operations The former patch (f08ca5903c7) examines the scalar modes by target hook scalar_mode_supported_p. It causes some i386 regression cases as XImode and OImode are not enabled in i386 target function. This patch examines the scalar mode by checking if the corresponding optabs are available for the mode. gcc/ PR target/111449 * expr.cc (qi_vector_mode_supported_p): Rename to... (by_pieces_mode_supported_p): ...this, and extends it to do the checking for both scalar and vector mode. (widest_fixed_size_mode_for_size): Call by_pieces_mode_supported_p to examine the mode. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 7aac575eff8..2af9fcbed18 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op) /* Return true if optabs exists for the mode and certain by pieces operations. */ static bool -qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) +by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) { + if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) +return false; + if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) - && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing) -return true; + && VECTOR_MODE_P (mode) + && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) +return false; if (op == COMPARE_BY_PIECES - && optab_handler (mov_optab, mode) != CODE_FOR_nothing - && can_compare_p (EQ, mode, ccp_jump)) -return true; + && !can_compare_p (EQ, mode, ccp_jump)) +return false; - return false; + return true; } /* Return the widest mode that can be used to perform part of an @@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { if (GET_MODE_SIZE (candidate) >= size) break; - if (qi_vector_mode_supported_p (candidate, op)) + if (by_pieces_mode_supported_p (candidate, op)) result = candidate; } @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, by_pieces_operation op) { mode = tmode.require (); if (GET_MODE_SIZE (mode) < size - && targetm.scalar_mode_supported_p (mode)) + && by_pieces_mode_supported_p (mode, op)) result = mode; } @@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size) break; if (GET_MODE_SIZE (candidate) >= size - && qi_vector_mode_supported_p (candidate, m_op)) + && by_pieces_mode_supported_p (candidate, m_op)) return candidate; } }
Re: [PATCH, expand] Checking available optabs for scalar modes in by pieces operations
Committed as r14-5001. Thanks Gui Haochen 在 2023/10/27 17:29, Richard Sandiford 写道: > HAO CHEN GUI writes: >> Hi, >> This patch checks available optabs for scalar modes used in by >> pieces operations. It fixes the regression cases caused by previous >> patch. Now both scalar and vector modes are examined by the same >> approach. >> >> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >> regressions. Is this OK for trunk? >> >> Thanks >> Gui Haochen >> >> >> ChangeLog >> Expand: Checking available optabs for scalar modes in by pieces operations >> >> The former patch (f08ca5903c7) examines the scalar modes by target >> hook scalar_mode_supported_p. It causes some i386 regression cases >> as XImode and OImode are not enabled in i386 target function. This >> patch examines the scalar mode by checking if the corresponding optabs >> are available for the mode. >> >> gcc/ >> PR target/111449 >> * expr.cc (qi_vector_mode_supported_p): Rename to... >> (by_pieces_mode_supported_p): ...this, and extends it to do >> the checking for both scalar and vector mode. >> (widest_fixed_size_mode_for_size): Call >> by_pieces_mode_supported_p to examine the mode. >> (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise. > > OK, thanks. > > Richard > >> patch.diff >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index 7aac575eff8..2af9fcbed18 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op) >> /* Return true if optabs exists for the mode and certain by pieces >> operations. */ >> static bool >> -qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) >> +by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) >> { >> + if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) >> +return false; >> + >>if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) >> - && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing) >> -return true; >> + && VECTOR_MODE_P (mode) >> + && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) >> +return false; >> >>if (op == COMPARE_BY_PIECES >> - && optab_handler (mov_optab, mode) != CODE_FOR_nothing >> - && can_compare_p (EQ, mode, ccp_jump)) >> -return true; >> + && !can_compare_p (EQ, mode, ccp_jump)) >> +return false; >> >> - return false; >> + return true; >> } >> >> /* Return the widest mode that can be used to perform part of an >> @@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, >> by_pieces_operation op) >>{ >> if (GET_MODE_SIZE (candidate) >= size) >>break; >> -if (qi_vector_mode_supported_p (candidate, op)) >> +if (by_pieces_mode_supported_p (candidate, op)) >>result = candidate; >>} >> >> @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, >> by_pieces_operation op) >> { >>mode = tmode.require (); >>if (GET_MODE_SIZE (mode) < size >> - && targetm.scalar_mode_supported_p (mode)) >> + && by_pieces_mode_supported_p (mode, op)) >>result = mode; >> } >> >> @@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size >> (unsigned int size) >>break; >> >> if (GET_MODE_SIZE (candidate) >= size >> -&& qi_vector_mode_supported_p (candidate, m_op)) >> +&& by_pieces_mode_supported_p (candidate, m_op)) >>return candidate; >>} >> }
[PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]
Hi, This patch enables vector mode for by pieces equality compare. It adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare relies both move and compare instructions, so both macro are changed. The vector load/store might be unaligned, so the 16-byte move and compare are only enabled when p8 vector enabled (TARGET_VSX + TARGET_EFFICIENT_UNALIGNED_VSX). This patch enables 16 byte by pieces move. As the vector mode is not enabled for by pieces move, TImode is used for the move. It caused some regression cases. I drafted the third patch to fix them. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable vector mode for by pieces equality compare This patch adds a new expand pattern - cbranchv16qi4 to enable vector mode by pieces equality compare on rs6000. The macro MOVE_MAX_PIECES (COMPARE_MAX_PIECES) is set to 16 bytes when P8 vector is enabled, otherwise keeps unchanged. The macro STORE_MAX_PIECES is set to the same value as MOVE_MAX_PIECES by default, so now it's explicitly defined and keeps unchanged. gcc/ PR target/111449 * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern. * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn sequence for V16QImode equality compare. * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. (STORE_MAX_PIECES): Define. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449-1.c: New. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index e8a596fb7e9..d0937f192d6 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2605,6 +2605,45 @@ (define_insn "altivec_vupklpx" } [(set_attr "type" "vecperm")]) +/* The cbranch_optabs doesn't allow FAIL, so altivec load/store + instructions are disabled as the cost is high for unaligned + load/store. */ +(define_expand "cbranchv16qi4" + [(use (match_operator 0 "equality_operator" + [(match_operand:V16QI 1 "reg_or_mem_operand") +(match_operand:V16QI 2 "reg_or_mem_operand")])) + (use (match_operand 3))] + "VECTOR_MEM_VSX_P (V16QImode) + && TARGET_EFFICIENT_UNALIGNED_VSX" +{ + if (!TARGET_P9_VECTOR + && !BYTES_BIG_ENDIAN + && MEM_P (operands[1]) + && !altivec_indexed_or_indirect_operand (operands[1], V16QImode) + && MEM_P (operands[2]) + && !altivec_indexed_or_indirect_operand (operands[2], V16QImode)) +{ + /* Use direct move for P8 little endian to skip bswap, as the byte +order doesn't matter for equality compare. */ + rtx reg_op1 = gen_reg_rtx (V16QImode); + rtx reg_op2 = gen_reg_rtx (V16QImode); + rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode); + rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode); + operands[1] = reg_op1; + operands[2] = reg_op2; +} + else +{ + operands[1] = force_reg (V16QImode, operands[1]); + operands[2] = force_reg (V16QImode, operands[2]); +} + + rtx_code code = GET_CODE (operands[0]); + operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]); + rs6000_emit_cbranch (V16QImode, operands); + DONE; +}) + ;; Compare vectors producing a vector result and a predicate, setting CR6 to ;; indicate a combined status (define_insn "altivec_vcmpequ_p" diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index cc24dd5301e..10279052636 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode) else emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b)); } + else if (mode == V16QImode) + { + gcc_assert (code == EQ || code == NE); + + rtx result_vector = gen_reg_rtx (V16QImode); + rtx cc_bit = gen_reg_rtx (SImode); + emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1)); + emit_insn (gen_cr6_test_for_lt (cc_bit)); + emit_insn (gen_rtx_SET (compare_result, + gen_rtx_COMPARE (comp_mode, cc_bit, + const1_rtx))); + } else emit_insn (gen_rtx_SET (compare_result, gen_rtx_COMPARE (comp_mode, op0, op1))); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 22595f6ebd7..51441825e20 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1730,6 +1730,8 @@ typedef struct rs6000_args in one reasonably fast instruction. */ #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) #define MAX_MOVE_MAX 8 +#define MOVE_MAX_PIECES (TARGET_P8_VECTOR ? 16 : (TARGET_POWERPC64 ? 8 : 4)) +#define STORE_MAX_PIECES (TARGET_POWERPC64 ? 8 : 4) /* Nonzero if acces
[PATCH-3, rs6000] Enable 16-byte by pieces move [PR111449]
Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization is unable to be taken. "no-vsx" option is added for 32-bit platform, as it sets the MOVE_MAX_PIECES to 4-byte on 32-bit platform and the array can't be loaded by one by pieces move. Another regression is on P8 LE. The 16-byte memory to memory is implemented by two TImode load/store. The TImode load/store is finally split to two DImode load/store on P8 LE as it doesn't have unaligned vector load/store instructions. Actually, 16-byte memory to memory move can be implement by two V2DI reversed load/store on P8 LE. The patch creates a insn_and_split pattern for this optimization. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable 16-byte by pieces move This patch enables 16-byte by pieces move. The 16-byte move is generated with TImode and finally implemented by vector instructions. There are several regression cases after the enablement. 16-byte TImode memory to memory move is originally implemented by two pairs of DImode load/store on P8 LE as there is no unalignment vsx load/store on it. The patch fixes the problem by creating an insn_and_split pattern and converts it to one pair of reversed load/store. Two SRA cases lost the SRA optimization as the array can be loaded by one 16-byte move so that not be initialized in LC0 on 32-bit platform. So fixes them by adding no-vsx option. gcc/ PR target/111449 * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New. gcc/testsuite/ PR target/111449 * gcc.dg/tree-ssa/sra-17.c: Add no-vsx option for powerpc ilp32. * gcc.dg/tree-ssa/sra-18.c: Likewise. * gcc.target/powerpc/pr111449-1.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f3b40229094..9f6bc49998a 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d") ;; VSX moves +;; TImode memory to memory move optimization on LE with p8vector +(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti" + [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z") + (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR + && !MEM_VOLATILE_P (operands[0]) + && !MEM_VOLATILE_P (operands[1]) + && !reload_completed" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx src = adjust_address (operands[1], V2DImode, 0); + emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src)); + rtx dest = adjust_address (operands[0], V2DImode, 0); + emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp)); + DONE; +} + [(set_attr "length" "16")]) + ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. (define_insn_and_split "*vsx_le_perm_load_" diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c index 221d96b6cd9..36d72c9256b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */ extern void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c index f5e6a21c2ae..3682a9a8c29 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */ extern void abort (void); struct foo { long x; }; diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c new file mode 100644 index 000..7003bdc0208 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { has_arch_pwr8 } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mvsx -O2" } */ + +/* Ensure 16-byte by pieces move is enabled. */ + +void move1 (void *s1, void *s2) +{ + __builtin_memcpy (s1, s2, 16); +} + +void move2 (void *s1) +{ + __builtin_memcpy (s1, "0123456789012345", 16); +} + +/
[PATCH-3v2, rs6000] Enable 16-byte by pieces move [PR111449]
Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization is unable to be taken. "no-vsx" option is added for 32-bit platform, as it sets the MOVE_MAX_PIECES to 4-byte on 32-bit platform and the array can't be loaded by one by pieces move. Another regression is on P8 LE. The 16-byte memory to memory is implemented by two TImode load/store. The TImode load/store is finally split to two DImode load/store on P8 LE as it doesn't have unaligned vector load/store instructions. Actually, 16-byte memory to memory move can be implement by two V2DI reversed load/store on P8 LE. The patch creates a insn_and_split pattern for this optimization. Compared to previous version, it fixes the syntax errors in test cases. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable 16-byte by pieces move This patch enables 16-byte by pieces move. The 16-byte move is generated with TImode and finally implemented by vector instructions. There are several regression cases after the enablement. 16-byte TImode memory to memory move is originally implemented by two pairs of DImode load/store on P8 LE as there is no unaligned vsx load/store on it. The patch fixes the problem by creating an insn_and_split pattern and converts it to one pair of reversed load/store. Two SRA cases lost the SRA optimization as the array can be loaded by one 16-byte move so that not be initialized in LC0 on 32-bit platform. So fixes them by adding no-vsx option. gcc/ PR target/111449 * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New. gcc/testsuite/ PR target/111449 * gcc.dg/tree-ssa/sra-17.c: Add no-vsx option for powerpc ilp32. * gcc.dg/tree-ssa/sra-18.c: Likewise. * gcc.target/powerpc/pr111449-1.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f3b40229094..9f6bc49998a 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d") ;; VSX moves +;; TImode memory to memory move optimization on LE with p8vector +(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti" + [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z") + (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR + && !MEM_VOLATILE_P (operands[0]) + && !MEM_VOLATILE_P (operands[1]) + && !reload_completed" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx src = adjust_address (operands[1], V2DImode, 0); + emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src)); + rtx dest = adjust_address (operands[0], V2DImode, 0); + emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp)); + DONE; +} + [(set_attr "length" "16")]) + ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. (define_insn_and_split "*vsx_le_perm_load_" diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c index 221d96b6cd9..b0d4811e77b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */ extern void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c index f5e6a21c2ae..2cdeae6e9e7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */ extern void abort (void); struct foo { long x; }; diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c new file mode 100644 index 000..7003bdc0208 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { has_arch_pwr8 } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mvsx -O2" } */ + +/* Ensure 16-byte by pieces move is enabled. */ + +void move1 (void *s1, void *s2) +{ + __builtin_memcpy (s1, s2, 16); +} + +
[PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI 7: r125:SI=r122:DI#0 0>>0x1f REG_DEAD r122:DI Failed to match this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1]))) Successfully matched this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x]))) This conversion blocks the further combination which combines to a SImode rotate and mask insert insn. Trying 9, 7 -> 10: 9: r127:SI=r130:DI#0&0xfffe REG_DEAD r130:DI 7: r125:SI#0=r129:DI 0>>0x1f&0x REG_DEAD r129:DI 10: r124:SI=r127:SI|r125:SI REG_DEAD r125:SI REG_DEAD r127:SI Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1])) 0))) Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x])) 0))) The root cause of the issue is if it's necessary to do the widen mode for lshiftrt when the target already has the narrow mode lshiftrt and its cost is not high. My former patch tried to fix the problem but not accepted yet. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html As it's stage 4 now, I drafted this patch to fix the regression by adding subreg patterns of SImode rotate and mask insert. It actually does reversed things and narrow the mode for lshiftrt so that it can matches the SImode rotate and mask insert. The case "rlwimi-2.c" is fixed and restore the corresponding number of insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm" is replaced with 9 "rldicl" as the sequence of combine is changed. It's not a regression as the total number of insns isn't changed. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Add subreg patterns for SImode rotate and mask insert In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an AND. The new pattern matches rotate and mask insert on rs6000. Thus it blocks the pattern to be further combined to a SImode rotate and mask insert pattern. This patch fixes the problem by adding two subreg pattern for SImode rotate and mask insert patterns. gcc/ PR target/93738 * config/rs6000/rs6000.md (*rotlsi3_insert_9): New. (*rotlsi3_insert_8): New. gcc/testsuite/ PR target/93738 * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit rotate instructions. * gcc.target/powerpc/rlwinm-0.c: Likewise. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bc8bc6ab060..b0b40f91e3e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert" ; difference between rlwimi and rldimi. We also might want dot forms, ; but not for rlwimi on POWER4 and similar processors. +; Subreg pattern of insn "*rotlsi3_insert" +(define_insn_and_split "*rotlsi3_insert_9" + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") + (ior:SI (and:SI +(match_operator:SI 8 "lowpart_subreg_operator" + [(and:DI (match_operator:DI 4 "rotate_mask_operator" + [(match_operand:DI 1 "gpc_reg_operand" "r") +(match_operand:SI 2 "const_int_operand" "n")]) + (match_operand:DI 3 "const_int_operand" "n"))]) +(match_operand:SI 5 "const_int_operand" "n")) + (and:SI (match_operand:SI 6 "gpc_reg_operand" "0") + (match_operand:SI 7 "const_int_operand" "n"] + "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode) + && GET_CODE (operands[4]) == LSHIFTRT + && INTVAL (operands[3]) == 0x + && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0" + "#" + "&& 1" + [(set (match_dup 0) + (ior:SI (and:SI (lshiftrt:SI (match_dup 9) +(match_dup 2)) + (match_dup 5)) + (and:SI (match_dup 6) + (match_dup 7] +{ + int offset = BYTES_BIG_ENDIAN ? 4 : 0; + operands[9] = gen_rtx_SUBREG (SImode, operand
Re: [PATCH] fwprop: Avoid volatile defines to be propagated
Hi Jeff, Thanks for your comments. 在 2024/3/4 6:02, Jeff Law 写道: > Why specifically are you worried here? Propagation of a volatile shouldn't > in and of itself cause a problem. We're not changing the number of volatile > accesses or anything like that -- we're just moving them around a bit. If the volatile asm operand is in a parallel set, it can't be eliminated after the propagation. So the define insn and use insn will execute the volatile asm block twice. That's the problem. Here is a real case from sanitizer_linux.cpp. The insn 62 has a volatile asm operands and it is propagated into insn 60. After propagation both insn 60 and 62 has the volatile asm operand. Thus asm block will be executed for twice. It causes sanitizer behaves abnormally in my test. propagating insn 62 into insn 60, replacing: (set (reg/v:DI 119 [ res ]) (reg:DI 133 [ res ])) successfully matched this instruction: (set (reg/v:DI 119 [ res ]) (asm_operands/v:DI ("mr 28, %5 mr 27, %8 mr 3, %7 mr 5, %9 mr 6, %10 mr 7, %11 li 0, %3 sc cmpdi cr1, 3, 0 crandc cr1*4+eq, cr1*4+eq, cr0*4+so bne- cr1, 1f li29, 0 stdu 29, -8(1) stdu 1, -%12(1) std 2, %13(1) mr12, 28 mtctr 12 mr3, 27 bctrl ld2, %13(1) li 0, %4 sc 1: mr %0, 3 ") ("=r") 0 [ (reg:SI 134) (const_int 22 [0x16]) (const_int 120 [0x78]) (const_int 1 [0x1]) (reg/v:DI 3 3 [ __fn ]) (reg/v:DI 4 4 [ __cstack ]) (reg/v:SI 5 5 [ __flags ]) (reg/v:DI 6 6 [ __arg ]) (reg/v:DI 7 7 [ __ptidptr ]) (reg/v:DI 8 8 [ __newtls ]) (reg/v:DI 9 9 [ __ctidptr ]) (const_int 32 [0x20]) (const_int 24 [0x18]) [ (asm_input:SI ("0") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) ] [] /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)) rescanning insn with uid = 60. updating insn 60 in-place (insn 62 61 60 6 (parallel [ (set (reg:DI 133 [ res ]) (asm_operands/v:DI ("mr 28, %5 mr 27, %8 mr 3, %7 mr 5, %9 mr 6, %10 mr 7, %11 li 0, %3 sc cmpdi cr1, 3, 0 crandc cr1*4+eq, cr1*4+eq, cr0*4+so bne- cr1, 1f li29, 0 stdu 29, -8(1) stdu 1, -%12(1) std 2, %13(1) mr12, 28 mtctr 12 mr3, 27 bctrl ld2, %13(1) li 0, %4 sc 1: mr %0, 3 ") ("=r") 0 [ (reg:SI 134) (const_int 22 [0x16]) (const_int 120 [0x78]) (const_int 1 [0x1]) (reg/v:DI 3 3 [ __fn ]) (reg/v:DI 4 4 [ __cstack ]) (reg/v:SI 5 5 [ __flags ]) (reg/v:DI 6 6 [ __arg ]) (reg/v:DI 7 7 [ __ptidptr ]) (reg/v:DI 8 8 [ __newtls ]) (reg/v:DI 9 9 [ __ctidptr ]) (const_int 32 [0x20]) (const_int 24 [0x18]) ] [ (asm_input:SI ("0") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_li
Re: [PATCH] fwprop: Avoid volatile defines to be propagated
Hi Jeff, 在 2024/3/4 11:37, Jeff Law 写道: > Can the same thing happen with a volatile memory load? I don't think that > will be caught by the volatile_insn_p check. Yes, I think so. If the define rtx contains volatile memory references, it may hit the same problem. We may use volatile_refs_p instead of volatile_insn_p? Thanks Gui Haochen
[PATCHv2] fwprop: Avoid volatile defines to be propagated
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. The volatile asm operand might be executed for multiple times if the define insn isn't eliminated after propagation. Now set_src_cost comparison might reject such propagation. But it has the chance to be taken after replacing set_src_cost with insn cost. Actually I found the problem in testing my patch which replacing set_src_cost with insn_cost in fwprop pass. Compared to the last version, the check volatile_insn_p is replaced with volatile_refs_p in order to check volatile memory reference also. https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646482.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog fwprop: Avoid volatile defines to be propagated The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f) which introduces an exception for propagation on single set insn. The propagation which might not be profitable (checked by profitable_p) is still allowed to be propagated to single set insn. It has a potential problem that a volatile operand might be propagated to a single set insn. If the define insn is not eliminated after propagation, the volatile operand will be executed for multiple times. This patch fixes the problem by skipping volatile set source rtx in propagation. gcc/ * fwprop.cc (forward_propagate_into): Return false for volatile set source rtx. gcc/testsuite/ * gcc.target/powerpc/fwprop-1.c: New. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 7872609b336..cb6fd6700ca 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = false) rtx dest = SET_DEST (def_set); rtx src = SET_SRC (def_set); + if (volatile_refs_p (src)) +return false; /* Allow propagations into a loop only for reg-to-reg copies, since replacing one register by another shouldn't increase the cost. diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c new file mode 100644 index 000..07b207f980c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */ +/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */ + +/* Verify that volatile asm operands doesn't be propagated. */ +long long foo () +{ + long long res; + __asm__ __volatile__( +"" + : "=r" (res) + : + : "memory"); + return res; +}
[PATCHv2, rs6000] Add subreg patterns for SImode rotate and mask insert
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI 7: r125:SI=r122:DI#0 0>>0x1f REG_DEAD r122:DI Failed to match this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1]))) Successfully matched this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x]))) This conversion blocks the further combination which combines to a SImode rotate and mask insert insn. Trying 9, 7 -> 10: 9: r127:SI=r130:DI#0&0xfffe REG_DEAD r130:DI 7: r125:SI#0=r129:DI 0>>0x1f&0x REG_DEAD r129:DI 10: r124:SI=r127:SI|r125:SI REG_DEAD r125:SI REG_DEAD r127:SI Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1])) 0))) Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x])) 0))) The root cause of the issue is if it's necessary to do the widen mode for lshiftrt when the target already has shiftrt for narrow mode and its cost is not high. My former patch tried to fix the problem but not accepted yet. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html As it's stage 4 now, I drafted this patch to fix the regression by adding subreg patterns of SImode rotate and mask insert. It actually does reversed things and narrow the mode for lshiftrt so that it can matches the SImode rotate and mask insert. The case "rlwimi-2.c" is fixed and restore the corresponding number of insns to original ones. Compared with last version, the main change is to remove changes for a testcase which was already fixed in another patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Add subreg patterns for SImode rotate and mask insert In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an AND. The new pattern matches rotate and mask insert on rs6000. Thus it blocks the pattern to be further combined to a SImode rotate and mask insert pattern. This patch fixes the problem by adding two subreg pattern for SImode rotate and mask insert patterns. gcc/ PR target/93738 * config/rs6000/rs6000.md (*rotlsi3_insert_subreg): New. (*rotlsi3_insert_4_subreg): New. gcc/testsuite/ PR target/93738 * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit rotate instructions. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bc8bc6ab060..996d0740faf 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert" ; difference between rlwimi and rldimi. We also might want dot forms, ; but not for rlwimi on POWER4 and similar processors. +; Subreg pattern of insn "*rotlsi3_insert" +(define_insn_and_split "*rotlsi3_insert_subreg" + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") + (ior:SI (and:SI +(match_operator:SI 8 "lowpart_subreg_operator" + [(and:DI (match_operator:DI 4 "rotate_mask_operator" + [(match_operand:DI 1 "gpc_reg_operand" "r") +(match_operand:SI 2 "const_int_operand" "n")]) + (match_operand:DI 3 "const_int_operand" "n"))]) +(match_operand:SI 5 "const_int_operand" "n")) + (and:SI (match_operand:SI 6 "gpc_reg_operand" "0") + (match_operand:SI 7 "const_int_operand" "n"] + "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode) + && GET_CODE (operands[4]) == LSHIFTRT + && INTVAL (operands[3]) == 0x + && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0" + "#" + "&& 1" + [(set (match_dup 0) + (ior:SI (and:SI (lshiftrt:SI (match_dup 9) +(match_dup 2)) + (match_dup 5)) + (and:SI (match_dup 6) + (match_dup 7] +{ + int offset = BYTES_BIG_ENDIAN ? 4 : 0; + operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset); +} + [(set_attr "type" "insert")]) + (define_insn "*rotl3_insert_2" [(set (ma
[PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits
Hi, This patch tries to fix the problem when a canonical form doesn't benefit on a specific target. The const operand of AND is and with the nonzero bits of another operand in combine pass. It's a canonical form, but it's no benefits for the target which has rotate and mask insns. As the mask is truncated, it can't match the insn conditions which it originally matches. For example, the following insn condition checks the sum of two AND masks. When one of the mask is truncated, the condition breaks. (define_insn "*rotlsi3_insert_5" [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") (match_operand:SI 2 "const_int_operand" "n,n")) (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") (match_operand:SI 4 "const_int_operand" "n,n"] "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" ... This patch tries to fix the problem by comparing the rtx cost. If another operand (varop) is not changed and rtx cost with new mask is not less than the original one, the mask is restored to original one. I'm not sure if comparison of rtx cost here is proper. The outer code is unknown and I suppose it as "SET". Also the rtx cost might not be accurate. >From my understanding, the canonical forms should always benefit as it can't be undo in combine pass. Do we have a perfect solution for this kind of issues? Looking forward for your advice. Another similar issues for canonical forms. Whether the widen mode for lshiftrt is always good? https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html Thanks Gui Haochen ChangeLog Combine: Don't truncate const operand of AND if it's no benefits In combine pass, the canonical form is to turn off all bits in the constant that are know to already be zero for AND. /* Turn off all bits in the constant that are known to already be zero. Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS which is tested below. */ constop &= nonzero; But it doesn't benefit when the target has rotate and mask insert insns. The AND mask is truncated and lost its information. Thus it can't match the insn conditions. For example, the following insn condition checks the sum of two AND masks. (define_insn "*rotlsi3_insert_5" [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") (match_operand:SI 2 "const_int_operand" "n,n")) (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") (match_operand:SI 4 "const_int_operand" "n,n"] "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" ... This patch restores the const operand of AND if the another operand is not optimized and the truncated const operand doesn't save the rtx cost. gcc/ * combine.cc (simplify_and_const_int_1): Restore the const operand of AND if varop is not optimized and the rtx cost of the new const operand is not reduced. gcc/testsuite/ * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and adjust the number of rotate and mask insns. * gcc.target/powerpc/rlwimi-1.c: Likewise. * gcc.target/powerpc/rlwimi-2.c: Likewise. patch.diff diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..16ff09ea854 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop, if (constop == nonzero) return varop; - if (varop == orig_varop && constop == orig_constop) -return NULL_RTX; + if (varop == orig_varop) +{ + if (constop == orig_constop) + return NULL_RTX; + else + { + rtx tmp = simplify_gen_binary (AND, mode, varop, +gen_int_mode (constop, mode)); + rtx orig = simplify_gen_binary (AND, mode, varop, + gen_int_mode (orig_constop, mode)); + if (set_src_cost (tmp, mode, optimize_this_for_speed_p) + < set_src_cost (orig, mode, optimize_this_for_speed_p)) + return tmp; + else + return NULL_RTX; + } +} /* Otherwise, return an AND. */ return simplify_gen_binary (AND, mode, varop, gen_int_mode (constop, mode)); diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c index 961be199901..d9dd4419f1d 100644 --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c @@ -2,15 +2,15 @@ /* { dg-options "-O2" } */ /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]
Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits
Hi, Gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html Thanks Gui Haochen 在 2024/3/11 13:41, HAO CHEN GUI 写道: > Hi, > This patch tries to fix the problem when a canonical form doesn't benefit > on a specific target. The const operand of AND is and with the nonzero > bits of another operand in combine pass. It's a canonical form, but it's no > benefits for the target which has rotate and mask insns. As the mask is > truncated, it can't match the insn conditions which it originally matches. > For example, the following insn condition checks the sum of two AND masks. > When one of the mask is truncated, the condition breaks. > > (define_insn "*rotlsi3_insert_5" > [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") > (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") > (match_operand:SI 2 "const_int_operand" "n,n")) > (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") > (match_operand:SI 4 "const_int_operand" "n,n"] > "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) >&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 >&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" > ... > > This patch tries to fix the problem by comparing the rtx cost. If another > operand (varop) is not changed and rtx cost with new mask is not less than > the original one, the mask is restored to original one. > > I'm not sure if comparison of rtx cost here is proper. The outer code is > unknown and I suppose it as "SET". Also the rtx cost might not be accurate. > From my understanding, the canonical forms should always benefit as it can't > be undo in combine pass. Do we have a perfect solution for this kind of > issues? Looking forward for your advice. > > Another similar issues for canonical forms. Whether the widen mode for > lshiftrt is always good? > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html > > Thanks > Gui Haochen > > ChangeLog > Combine: Don't truncate const operand of AND if it's no benefits > > In combine pass, the canonical form is to turn off all bits in the constant > that are know to already be zero for AND. > > /* Turn off all bits in the constant that are known to already be zero. > Thus, if the AND isn't needed at all, we will have CONSTOP == > NONZERO_BITS > which is tested below. */ > > constop &= nonzero; > > But it doesn't benefit when the target has rotate and mask insert insns. > The AND mask is truncated and lost its information. Thus it can't match > the insn conditions. For example, the following insn condition checks > the sum of two AND masks. > > (define_insn "*rotlsi3_insert_5" > [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") > (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") > (match_operand:SI 2 "const_int_operand" "n,n")) > (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") > (match_operand:SI 4 "const_int_operand" "n,n"] > "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) >&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 >&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" > ... > > This patch restores the const operand of AND if the another operand is > not optimized and the truncated const operand doesn't save the rtx cost. > > gcc/ > * combine.cc (simplify_and_const_int_1): Restore the const operand > of AND if varop is not optimized and the rtx cost of the new const > operand is not reduced. > > gcc/testsuite/ > * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and > adjust the number of rotate and mask insns. > * gcc.target/powerpc/rlwimi-1.c: Likewise. > * gcc.target/powerpc/rlwimi-2.c: Likewise. > > patch.diff > diff --git a/gcc/combine.cc b/gcc/combine.cc > index a4479f8d836..16ff09ea854 100644 > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx > varop, >if (constop == nonzero) > return varop; > > - if (varop == orig_varop && constop == orig_constop) > -return NULL_RTX; > + if (varop == orig_varop) > +{ > + if (constop == orig_constop) > + return NULL_RTX; &g
[PATCH] Value Range: Add range op for builtin isinf
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by adding range op for builtin isinf. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isinf The builtin isinf is not folded at front end if the corresponding optab exists. So the range op fro isinf is needed for value range analysis. This patch adds range op for builtin isinf. gcc/ * gimple-range-op.cc (class cfn_isinf): New. (op_cfn_isinf): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_FLT_FN (BUILT_IN_ISINF). gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index a98f7db62a7..9de130b4022 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1140,6 +1140,57 @@ private: bool m_is_pos; } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); +// Implement range operator for CFN_BUILT_IN_ISINF +class cnf_isinf : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange &r, tree type, const frange &op1, + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isinf ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || (!real_isinf (&op1.lower_bound ()) + && !real_isinf (&op1.upper_bound ( + { + r.set_zero (type); + return true; + } + +return false; + } + virtual bool op1_range (frange &r, tree type, const irange &lhs, + const frange &, relation_trio) const override + { +if (lhs.zero_p ()) + { + nan_state nan (true); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +if (!range_includes_zero_p (&lhs)) + { + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. + // Set range to [-INF,+INF] + r.set_varying (type); + r.clear_nan (); + return true; + } + +return false; + } +} op_cfn_isinf; // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator @@ -1232,6 +1283,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_signbit; break; +CASE_FLT_FN (BUILT_IN_ISINF): + m_op1 = gimple_call_arg (call, 0); + m_operator = &op_cfn_isinf; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c new file mode 100644 index 000..468f1bcf5c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void +test1 (double x) +{ + if (x > __DBL_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test2 (float x) +{ + if (x > __FLT_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__FLT_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test3 (double x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__) +link_error (); +} + +void +test4 (float x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ +
[patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode [PR97786]
Hi, This patch implemented optab_isinf for SF/DF/TFmode by rs6000 test data class instructions. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFmode, DFmode and TFmode gcc/ PR target/97786 * config/rs6000/vsx.md (isinf2): New expand for SFmode and DFmode. (isinf2): New expand for TFmode. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..f0cc02f7e7b 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5313,6 +5313,26 @@ (define_expand "xststdcp" operands[4] = CONST0_RTX (SImode); }) +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c new file mode 100644 index 000..1b1e6d642de --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */ + +int test1 (double x) +{ + return __builtin_isinf (x); +} + +int test2 (float x) +{ + return __builtin_isinf (x); +} + +int test3 (float x) +{ + return __builtin_isinff (x); +} + +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c new file mode 100644 index 000..de7f2d67c4b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786]
Hi, This patch folds builtin_isinf on IBM long double to builtin_isinf on double type. The former patch https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html implemented the DFmode isinf_optab. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double For IBM long double, Inf is encoded in the high-order double value only. So the builtin_isinf on IBM long double can be folded to builtin_isinf on double type. As former patch implemented DFmode isinf_optab, this patch converts builtin_isinf on IBM long double to builtin_isinf on double type if the DFmode isinf_optab exists. gcc/ PR target/97786 * builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double isinf call to double isinf call when DFmode isinf_optab exists. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-3.c: New test. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index eda8bea9c4b..d2786f207b8 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -9574,6 +9574,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree fndecl, tree arg) type = double_type_node; mode = DFmode; arg = fold_build1_loc (loc, NOP_EXPR, type, arg); + tree const isinf_fn = builtin_decl_explicit (BUILT_IN_ISINF); + if (interclass_mathfn_icode (arg, isinf_fn) != CODE_FOR_nothing) + { + result = build_call_expr (isinf_fn, 1, arg); + return result; + } } get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false); real_from_string (&r, buf); diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-3.c b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c new file mode 100644 index 000..1c816921e1a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 2 } } */
[Patch, rs6000] Enable overlap memory store for block memory clear
Hi, This patch enables overlap memory store for block memory clear which saves the number of store instructions. The expander calls widest_fixed_size_mode_for_block_clear to get the mode for looped block clear and calls widest_fixed_size_mode_for_block_clear to get the mode for last overlapped clear. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk or next stage 1? Thanks Gui Haochen ChangeLog rs6000: Enable overlap memory store for block memory clear gcc/ * config/rs6000/rs6000-string.cc (widest_fixed_size_mode_for_block_clear): New. (smallest_fixed_size_mode_for_block_clear): New. (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to get the mode for looped memory stores and call smallest_fixed_size_mode_for_block_clear to get the mode for the last overlapped memory store. gcc/testsuite * gcc.target/powerpc/block-clear-1.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 133e5382af2..c2a6095a586 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -38,6 +38,49 @@ #include "profile-count.h" #include "predict.h" +/* Return the widest mode which mode size is less than or equal to the + size. */ +static fixed_size_mode +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int align, + bool unaligned_vsx_ok) +{ + machine_mode mode; + + if (TARGET_ALTIVEC + && size >= 16 + && (align >= 128 + || unaligned_vsx_ok)) +mode = V4SImode; + else if (size >= 8 + && TARGET_POWERPC64 + && (align >= 64 + || !STRICT_ALIGNMENT)) +mode = DImode; + else if (size >= 4 + && (align >= 32 + || !STRICT_ALIGNMENT)) +mode = SImode; + else if (size >= 2 + && (align >= 16 + || !STRICT_ALIGNMENT)) +mode = HImode; + else +mode = QImode; + + return as_a (mode); +} + +/* Return the smallest mode which mode size is smaller than or eqaul to + the size. */ +static fixed_size_mode +smallest_fixed_size_mode_for_block_clear (unsigned int size) +{ + if (size > UNITS_PER_WORD) +return as_a (V4SImode); + + return smallest_int_mode_for_size (size * BITS_PER_UNIT); +} + /* Expand a block clear operation, and return 1 if successful. Return 0 if we should let the compiler generate normal code. @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[]) HOST_WIDE_INT align; HOST_WIDE_INT bytes; int offset; - int clear_bytes; int clear_step; /* If this is not a fixed size move, just call memcpy */ @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[]) bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); - for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) + auto mode = widest_fixed_size_mode_for_block_clear (bytes, align, + unaligned_vsx_ok); + offset = 0; + rtx dest; + + do { - machine_mode mode = BLKmode; - rtx dest; + unsigned int size = GET_MODE_SIZE (mode); - if (TARGET_ALTIVEC - && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok))) + while (bytes >= size) { - clear_bytes = 16; - mode = V4SImode; - } - else if (bytes >= 8 && TARGET_POWERPC64 - && (align >= 64 || !STRICT_ALIGNMENT)) - { - clear_bytes = 8; - mode = DImode; - if (offset == 0 && align < 64) - { - rtx addr; + dest = adjust_address (orig_dest, mode, offset); + emit_move_insn (dest, CONST0_RTX (mode)); - /* If the address form is reg+offset with offset not a -multiple of four, reload into reg indirect form here -rather than waiting for reload. This way we get one -reload, not one per store. */ - addr = XEXP (orig_dest, 0); - if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM) - && CONST_INT_P (XEXP (addr, 1)) - && (INTVAL (XEXP (addr, 1)) & 3) != 0) - { - addr = copy_addr_to_reg (addr); - orig_dest = replace_equiv_address (orig_dest, addr); - } - } - } - else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT)) - { /* move 4 bytes */ - clear_bytes = 4; - mode = SImode; - } - else if (bytes >= 2 && (align >= 16 || !STRICT_ALIGNMENT)) - { /* move 2 bytes */ - clear_bytes = 2; - mode = HImode; - } - else /* move 1 byte at a time */ - { - clear_bytes = 1; - mode = QImode; + offset += size; + bytes -= size; } - dest = adjust_
[PATCH] fwprop: Avoid volatile defines to be propagated
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. It has potential risk as the behavior is wrong. Currently set_src_cost comparison can reject such propagation. But the propagation might be taken after replacing set_src_cost with insn cost. Actually I found the problem in testing my patch which replacing et_src_cost with insn cost for fwprop. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog fwprop: Avoid volatile defines to be propagated The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f) which introduces an exception for propagation on single set insn. The propagation which might not be profitable (checked by profitable_p) is still allowed to be propagated to single set insn. It has a potential problem that a volatile asm operand will try to be propagated to a single set insn. The volatile asm operand is originally banned in profitable_p. This patch fixes the problem by skipping volatile set source in define set finding. gcc/ * fwprop.cc (forward_propagate_into): Return false for volatile set source. gcc/testsuite/ * gcc.target/powerpc/fwprop-1.c: New. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 7872609b336..89dce88b43d 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = false) rtx dest = SET_DEST (def_set); rtx src = SET_SRC (def_set); + if (volatile_insn_p (src)) +return false; /* Allow propagations into a loop only for reg-to-reg copies, since replacing one register by another shouldn't increase the cost. diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c new file mode 100644 index 000..07b207f980c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */ +/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */ + +/* Verify that volatile asm operands doesn't try to be propagated. */ +long long foo () +{ + long long res; + __asm__ __volatile__( +"" + : "=r" (res) + : + : "memory"); + return res; +}
[PATCH, rs6000] Refactor expand_compare_loop and split it to two functions
Hi, This patch refactors function expand_compare_loop and split it to two functions. One is for fixed length and another is for variable length. These two functions share some low level common help functions. Besides above changes, the patch also does: 1. Don't generate load and compare loop when max_bytes is less than loop bytes. 2. Remove do_load_mask_compare as it's no needed. All sub-targets entering the function should support efficient overlapping load and compare. 3. Implement an variable length overlapping load and compare for the case which remain bytes is less than the loop bytes in variable length compare. The 4k boundary test and one-byte load and compare loop are removed as they're no need now. 4. Remove the codes for "bytes > max_bytes" with fixed length as the case is already excluded by pre-checking. 5. Remove running time codes for "bytes > max_bytes" with variable length as it should jump to call library at the beginning. 6. Enhance do_overlap_load_compare to avoid overlapping load and compare when the remain bytes can be loaded and compared by a smaller unit. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Refactor expand_compare_loop and split it to two functions The original expand_compare_loop has a complicated logical as it's designed for both fixed and variable length. This patch splits it to two functions and make these two functions share common help functions. Also the 4K boundary test and corresponding one byte load and compare are replaced by variable length overlapping load and compare. The do_load_mask_compare is removed as all sub-targets entering the function has efficient overlapping load and compare so that mask load is no needed. gcc/ * config/rs6000/rs6000-string.cc (do_isel): Remove. (do_load_mask_compare): Remove. (do_reg_compare): New. (do_load_and_compare): New. (do_overlap_load_compare): Do load and compare with a small unit other than overlapping load and compare when the remain bytes can be done by one instruction. (expand_compare_loop): Remove. (get_max_inline_loop_bytes): New. (do_load_compare_rest_of_loop): New. (generate_6432_conversion): Set it to a static function and move ahead of gen_diff_handle. (gen_diff_handle): New. (gen_load_compare_loop): New. (gen_library_call): New. (expand_compare_with_fixed_length): New. (expand_compare_with_variable_length): New. (expand_block_compare): Call expand_compare_with_variable_length to expand block compare for variable length. Call expand_compare_with_fixed_length to expand block compare loop for fixed length. gcc/testsuite/ * gcc.target/powerpc/block-cmp-5.c: New. * gcc.target/powerpc/block-cmp-6.c: New. * gcc.target/powerpc/block-cmp-7.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index f707bb2727e..018b87f2501 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -404,21 +404,6 @@ do_ifelse (machine_mode cmpmode, rtx_code comparison, LABEL_NUSES (true_label) += 1; } -/* Emit an isel of the proper mode for DEST. - - DEST is the isel destination register. - SRC1 is the isel source if CR is true. - SRC2 is the isel source if CR is false. - CR is the condition for the isel. */ -static void -do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr) -{ - if (GET_MODE (dest) == DImode) -emit_insn (gen_isel_cc_di (dest, cmp, src_t, src_f, cr)); - else -emit_insn (gen_isel_cc_si (dest, cmp, src_t, src_f, cr)); -} - /* Emit a subtract of the proper mode for DEST. DEST is the destination register for the subtract. @@ -499,65 +484,61 @@ do_rotl3 (rtx dest, rtx src1, rtx src2) emit_insn (gen_rotlsi3 (dest, src1, src2)); } -/* Generate rtl for a load, shift, and compare of less than a full word. - - LOAD_MODE is the machine mode for the loads. - DIFF is the reg for the difference. - CMP_REM is the reg containing the remaining bytes to compare. - DCOND is the CCUNS reg for the compare if we are doing P9 code with setb. - SRC1_ADDR is the first source address. - SRC2_ADDR is the second source address. - ORIG_SRC1 is the original first source block's address rtx. - ORIG_SRC2 is the original second source block's address rtx. */ +/* Do the compare for two registers. */ static void -do_load_mask_compare (const machine_mode load_mode, rtx diff, rtx cmp_rem, rtx dcond, - rtx src1_addr, rtx src2_addr, rtx orig_src1, rtx orig_src2) +do_reg_compare (bool use_vec, rtx vec_result, rtx diff, rtx *dcond, rtx d1, + rtx d2) { - HOST_WIDE_INT load_mode_size = GET_MODE_SIZE (load_mode); - rtx shift_amount = gen_reg_rtx (word_mode); - rtx d1 = gen_reg_rtx (wor
[Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
Hi, This patch eliminates unnecessary byte swaps for block clear on P8 LE. For block clear, all the bytes are set to zero. The byte order doesn't make sense. So the alignment of destination could be set to the store mode size in stead of 1 byte in order to eliminates unnecessary byte swap instructions on P8 LE. The test case shows the problem. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE gcc/ PR target/113325 * config/rs6000/rs6000-string.cc (expand_block_clear): Set the alignment of destination to the size of mode. gcc/testsuite/ PR target/113325 * gcc.target/powerpc/pr113325.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 7f777666ba9..4c9b2cbeefc 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) } dest = adjust_address (orig_dest, mode, offset); - + /* Set the alignment of dest to the size of mode in order to +avoid unnecessary byte swaps on LE. */ + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); emit_move_insn (dest, CONST0_RTX (mode)); } diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c new file mode 100644 index 000..4a3cae019c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ + +void* foo (void* s1) +{ + return __builtin_memset (s1, 0, 32); +}
Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
Hi Richard, Thanks so much for your comments. >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000-string.cc >> b/gcc/config/rs6000/rs6000-string.cc >> index 7f777666ba9..4c9b2cbeefc 100644 >> --- a/gcc/config/rs6000/rs6000-string.cc >> +++ b/gcc/config/rs6000/rs6000-string.cc >> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) >> } >> >>dest = adjust_address (orig_dest, mode, offset); >> - >> + /* Set the alignment of dest to the size of mode in order to >> +avoid unnecessary byte swaps on LE. */ >> + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); > > but the alignment is now wrong which might cause ripple-down > wrong-code effects, no? > > It's probably bad to hide the byte-swapping in the move patterns (I'm > just guessing > you do that) Here I just change the alignment of "dest" which is temporary used for move. The orig_dest is untouched and keep the original alignment. The subsequent insns which use orig_dest are not affected. I am not sure if it causes ripple-down effects. Do you mean the dest might be reused later? But I think the alignment is different even though the mode and offset is the same. Looking forward to your advice. Thanks Gui Haochen
[PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64
Hi, On P9 "setb" is used to set the result of block compare. So it works with m32 and mpowerpc64. On P8, carry bit is used. So it can't work with m32 and mpowerpc64. This patch enables block compare expand for m32 and mpowerpc64 on P9. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable block compare expand on P9 with m32 and mpowerpc64 gcc/ * config/rs6000/rs6000-string.cc (expand_block_compare): Enable P9 with m32 and mpowerpc64. gcc/testsuite/ * gcc.target/powerpc/block-cmp-1.c: Exclude m32 and mpowerpc64. * gcc.target/powerpc/block-cmp-4.c: Likewise. * gcc.target/powerpc/block-cmp-8.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 018b87f2501..346708071b5 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1677,11 +1677,12 @@ expand_block_compare (rtx operands[]) /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ gcc_assert (TARGET_POPCNTD); - /* This case is complicated to handle because the subtract - with carry instructions do not generate the 64-bit - carry and so we must emit code to calculate it ourselves. - We choose not to implement this yet. */ - if (TARGET_32BIT && TARGET_POWERPC64) + /* For P8, this case is complicated to handle because the subtract + with carry instructions do not generate the 64-bit carry and so + we must emit code to calculate it ourselves. We skip it on P8 + but setb works well on P9. */ + if (TARGET_32BIT && TARGET_POWERPC64 + && !TARGET_P9_MISC) return false; /* Allow this param to shut off all expansion. */ diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c index bcf0cb2ab4f..cd076cf1dce 100644 --- a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-vsx" } */ +/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */ /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } } */ /* Test that it still can do expand for memcmpsi instead of calling library diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c index c86febae68a..9373b53a3a4 100644 --- a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c @@ -1,5 +1,6 @@ /* { dg-do compile { target be } } */ /* { dg-options "-O2 -mdejagnu-cpu=power7" } */ +/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */ /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } } */ /* Test that it does expand for memcmpsi instead of calling library on diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c new file mode 100644 index 000..b470f873973 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c @@ -0,0 +1,8 @@ +/* { dg-do run { target ilp32 } } */ +/* { dg-options "-O2 -m32 -mpowerpc64" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ +/* { dg-timeout-factor 2 } */ + +/* Verify memcmp on m32 mpowerpc64 */ + +#include "../../gcc.dg/memcmp-1.c"
Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions
Hi Kewen, 在 2024/1/15 14:16, Kewen.Lin 写道: > Considering it's stage 4 now and the impact of this patch, let's defer > this to next stage 1, if possible could you organize the above changes > into patches: > > 1) Refactor expand_compare_loop by splitting into two functions without >any functional changes. > 2) Remove some useless codes like 2, 4, 5. > 3) Some more enhancements like 1, 3, 6. > > ? It would be helpful for the review. Thanks! Thanks for your review comments. I will re-organize it at new stage 1.
[PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs
Hi, This patch adds const0 move checking for CLEAR_BY_PIECES. The original vec_duplicate handles duplicates of non-constant inputs. But 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move by that mode. The test cases will be added in subsequent target specific patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog expand: Add const0 move checking for CLEAR_BY_PIECES optabs vec_duplicate handles duplicates of non-constant inputs. The 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move. This patch adds the checking. gcc/ * expr.cc (by_pieces_mode_supported_p): Add const0 move checking for CLEAR_BY_PIECES. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 34f5ff90a9f..cd960349a53 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1006,14 +1006,21 @@ can_use_qi_vectors (by_pieces_operation op) static bool by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) { - if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) + enum insn_code icode = optab_handler (mov_optab, mode); + if (icode == CODE_FOR_nothing) return false; - if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) + if (op == SET_BY_PIECES && VECTOR_MODE_P (mode) && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) return false; + if (op == CLEAR_BY_PIECES + && VECTOR_MODE_P (mode) + && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing + && !insn_operand_matches (icode, 1, CONST0_RTX (mode))) +return false; + if (op == COMPARE_BY_PIECES && !can_compare_p (EQ, mode, ccp_jump)) return false;
[PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
Hi, This patch replaces rtx_cost with insn_cost in forward propagation. In the PR, one constant vector should be propagated and replace a pseudo in a store insn if we know it's a duplicated constant vector. It reduces the insn cost but not rtx cost. In this case, the kind of destination operand (memory or pseudo) decides the cost and rtx cost can't reflect it. The test case is added in the second target specific patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern gcc/ PR target/113325 * fwprop.cc (try_fwprop_subst_pattern): Replace rtx_cost with insn_cost. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 0707a234726..b05b2538edc 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -467,20 +467,17 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, redo_changes (0); } - /* ??? In theory, it should be better to use insn costs rather than - set_src_costs here. That would involve replacing this code with - change_is_worthwhile. */ bool ok = recog (attempt, use_change); if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) -if (rtx use_set = single_set (use_rtl)) +if (single_set (use_rtl)) { bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl)); + auto new_cost = insn_cost (use_rtl, speed); temporarily_undo_changes (0); - auto old_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); + /* Invalidate recog data. */ + INSN_CODE (use_rtl) = -1; + auto old_cost = insn_cost (use_rtl, speed); redo_changes (0); - auto new_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); if (new_cost > old_cost) { if (dump_file)
[Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]
Hi, This patch creates an insn_and_split pattern which helps the duplicated constant vector replace the source pseudo of store insn in fwprop pass. Thus the store can be implemented by a single stxvd2x and it eliminates the unnecessary byte swap insn on P8 LE. The test case shows the optimization. The patch depends on the first generic patch which uses insn cost in fwprop. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store gcc/ PR target/113325 * config/rs6000/predicates.md (duplicate_easy_altivec_constant): New. * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New. gcc/testsuite/ PR target/113325 * gcc.target/powerpc/pr113325.c: New. patch.diff diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index ef7d3f214c4..8ab6db630b7 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -759,6 +759,14 @@ (define_predicate "easy_vector_constant" return false; }) +;; Return 1 if it's a duplicated easy_altivec_constant. +(define_predicate "duplicate_easy_altivec_constant" + (and (match_code "const_vector") + (match_test "easy_altivec_constant (op, mode)")) +{ + return const_vec_duplicate_p (op); +}) + ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF. (define_predicate "easy_vector_constant_add_self" (and (match_code "const_vector") diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 26fa32829af..98e4be26f64 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3362,6 +3362,29 @@ (define_insn "*vsx_stxvd2x4_le_" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) +(define_insn_and_split "vsx_stxvd2x4_le_const_" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") + (match_operand:VSX_W 1 "duplicate_easy_altivec_constant" "W"))] + "!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + && !TARGET_P9_VECTOR" + "#" + "&& 1" + [(set (match_dup 2) + (match_dup 1)) + (set (match_dup 0) + (vec_select:VSX_W + (match_dup 2) + (parallel [(const_int 2) (const_int 3) +(const_int 0) (const_int 1)])))] +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) +: operands[1]; + +} + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + (define_insn "*vsx_stxvd2x8_le_V8HI" [(set (match_operand:V8HI 0 "memory_operand" "=Z") (vec_select:V8HI diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c new file mode 100644 index 000..dff68ac0a51 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ + +void* foo (void* s1) +{ + return __builtin_memset (s1, 0, 32); +}
[Patchv2, rs6000] Correct definition of macro of fixed point efficient unaligned
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640076.html the main change is to replace the macro with slow_unaligned_access. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of macro of fixed point efficient unaligned Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to guard the platform which is efficient on fixed point unaligned load/store. It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled from P8 and can be disabled by mno-vsx option. So the definition is wrong. This patch corrects the problem and call slow_unaligned_access to judge if fixed point unaligned load/store is efficient or not. gcc/ * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED): Remove. * config/rs6000/rs6000-string.cc (select_block_compare_mode): Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with targetm.slow_unaligned_access. (expand_block_compare_gpr): Likewise. (expand_block_compare): Likewise. (expand_strncmp_gpr_sequence): Likewise. gcc/testsuite/ * gcc.target/powerpc/block-cmp-1.c: New. * gcc.target/powerpc/block-cmp-2.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 44a946cd453..cb9eeef05d8 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, else if (bytes == GET_MODE_SIZE (QImode)) return QImode; else if (bytes < GET_MODE_SIZE (SImode) - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (SImode, align) && offset >= GET_MODE_SIZE (SImode) - bytes) /* This matches the case were we have SImode and 3 bytes and offset >= 1 and permits us to move back one and overlap @@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, unwanted bytes off of the input. */ return SImode; else if (word_mode_ok && bytes < UNITS_PER_WORD - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (word_mode, align) && offset >= UNITS_PER_WORD-bytes) /* Similarly, if we can use DImode it will get matched here and can do an overlapping read that ends at the end of the block. */ @@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, align)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ @@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[]) /* The code generated for p7 and older is not faster than glibc memcmp if alignment is small and length is not short, so bail out to avoid those conditions. */ - if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx)) && ((base_align == 1 && bytes > 16) || (base_align == 2 && bytes > 32))) return false; @@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT bytes_to_compare, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes_to_compare >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, align)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 326c45221e9..3971a56c588 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -483,10 +483,6 @@ extern int rs6000_vector_align[]; #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT) -/* This wants to be set for p8 and newer. On p7, overlapping unaligned - loads are slow. */ -#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX - /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present in power7, so conditionalize them on p8 features. TImode syncs need quad memory support. */ diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c new file mode 100644 index 000..bcf0cb2ab4f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-
[Patchv2, rs6000] Clean up pre-checkings of expand_block_compare
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the performance of expand is better than the performance of library when the length is long. Compared to last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640082.html the main change is to add some comments and move the variable definition closed to its use. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up the pre-checkings of expand_block_compare gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Return false (call library) when it's optimized for size. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. Also performance test shows the expand of block compare with 16 bytes to 64 bytes length is better than library on P7 BE. gcc/testsuite/ * gcc.target/powerpc/block-cmp-3.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index cb9eeef05d8..49670cef4d7 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1946,36 +1946,32 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { - rtx target = operands[0]; - rtx orig_src1 = operands[1]; - rtx orig_src2 = operands[2]; - rtx bytes_rtx = operands[3]; - rtx align_rtx = operands[4]; + /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ + gcc_assert (TARGET_POPCNTD); - /* This case is complicated to handle because the subtract - with carry instructions do not generate the 64-bit - carry and so we must emit code to calculate it ourselves. - We choose not to implement this yet. */ - if (TARGET_32BIT && TARGET_POWERPC64) + if (optimize_insn_for_size_p ()) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - /* Allow this param to shut off all expansion. */ if (rs6000_block_compare_inline_limit == 0) return false; - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2 + /* This case is complicated to handle because the subtract + with carry instructions do not generate the 64-bit + carry and so we must emit code to calculate it ourselves. + We choose not to implement this yet. */ + if (TARGET_32BIT && TARGET_POWERPC64) return false; - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + rtx target = operands[0]; + rtx orig_src1 = operands[1]; + rtx orig_src2 = operands[2]; + rtx bytes_rtx = operands[3]; + rtx align_rtx = operands[4]; + + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ +if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2023,14 +2019,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx)) - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) -return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c new file mode 100644 index 000..c7e853ad593 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } } */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 4); +}
[Patchv3, rs6000] Correct definition of macro of fixed point efficient unaligned
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640832.html the main change is to pass alignment measured by bits to slow_unaligned_access. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of macro of fixed point efficient unaligned Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to guard the platform which is efficient on fixed point unaligned load/store. It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled from P8 and can be disabled by mno-vsx option. So the definition is wrong. This patch corrects the problem and call slow_unaligned_access to judge if fixed point unaligned load/store is efficient or not. gcc/ * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED): Remove. * config/rs6000/rs6000-string.cc (select_block_compare_mode): Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with targetm.slow_unaligned_access. (expand_block_compare_gpr): Likewise. (expand_block_compare): Likewise. (expand_strncmp_gpr_sequence): Likewise. gcc/testsuite/ * gcc.target/powerpc/block-cmp-1.c: New. * gcc.target/powerpc/block-cmp-2.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 44a946cd453..05dc41622f4 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, else if (bytes == GET_MODE_SIZE (QImode)) return QImode; else if (bytes < GET_MODE_SIZE (SImode) - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (SImode, align * BITS_PER_UNIT) && offset >= GET_MODE_SIZE (SImode) - bytes) /* This matches the case were we have SImode and 3 bytes and offset >= 1 and permits us to move back one and overlap @@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, unwanted bytes off of the input. */ return SImode; else if (word_mode_ok && bytes < UNITS_PER_WORD - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (word_mode, align * BITS_PER_UNIT) && offset >= UNITS_PER_WORD-bytes) /* Similarly, if we can use DImode it will get matched here and can do an overlapping read that ends at the end of the block. */ @@ -1749,7 +1749,8 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, + align * BITS_PER_UNIT)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ @@ -2026,7 +2027,7 @@ expand_block_compare (rtx operands[]) /* The code generated for p7 and older is not faster than glibc memcmp if alignment is small and length is not short, so bail out to avoid those conditions. */ - if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT) && ((base_align == 1 && bytes > 16) || (base_align == 2 && bytes > 32))) return false; @@ -2168,7 +2169,8 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT bytes_to_compare, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes_to_compare >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, + align * BITS_PER_UNIT)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 326c45221e9..3971a56c588 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -483,10 +483,6 @@ extern int rs6000_vector_align[]; #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT) -/* This wants to be set for p8 and newer. On p7, overlapping unaligned - loads are slow. */ -#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX - /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present in power7, so conditionalize them on p8 features. TImode syncs need quad memory support. */ diff --git a/gcc/testsuite/gcc.target/powerpc/bl
[Patch, rs6000] Call library for block memory compare when optimizing for size
Hi, This patch call library function for block memory compare when it's optimized for size. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Call library for block memory compare when optimizing for size gcc/ * config/rs6000/rs6000-string.cc (expand_block_compare): Return false when optimizing for size. gcc/testsuite/ * gcc.target/powerpc/block-cm-3.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 05dc41622f4..5149273b80e 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1947,6 +1947,9 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { + if (optimize_insn_for_size_p ()) +return false; + rtx target = operands[0]; rtx orig_src1 = operands[1]; rtx orig_src2 = operands[2]; diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c new file mode 100644 index 000..c7e853ad593 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } } */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 4); +}
[Patchv3, rs6000] Clean up pre-checkings of expand_block_compare
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the performance of expand is better than the performance of library when the length is long. Compared to last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640833.html the main change is to split optimization for size to a separate patch and add a testcase for P7 BE. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up the pre-checkings of expand_block_compare Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. Also performance test shows the expand of block compare is better than library on P7 BE when the length is from 16 bytes to 64 bytes. gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Remove P7 CPU test and let P7 BE do the expand. gcc/testsuite/ * gcc.target/powerpc/block-cmp-4.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 5149273b80e..09db57255fa 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1947,15 +1947,12 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { + /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ + gcc_assert (TARGET_POPCNTD); + if (optimize_insn_for_size_p ()) return false; - rtx target = operands[0]; - rtx orig_src1 = operands[1]; - rtx orig_src2 = operands[2]; - rtx bytes_rtx = operands[3]; - rtx align_rtx = operands[4]; - /* This case is complicated to handle because the subtract with carry instructions do not generate the 64-bit carry and so we must emit code to calculate it ourselves. @@ -1963,23 +1960,19 @@ expand_block_compare (rtx operands[]) if (TARGET_32BIT && TARGET_POWERPC64) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - /* Allow this param to shut off all expansion. */ if (rs6000_block_compare_inline_limit == 0) return false; - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2 -return false; + rtx target = operands[0]; + rtx orig_src1 = operands[1]; + rtx orig_src2 = operands[2]; + rtx bytes_rtx = operands[3]; + rtx align_rtx = operands[4]; - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ + if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2027,14 +2020,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT) - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) -return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c new file mode 100644 index 000..c86febae68a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target be } } */ +/* { dg-options "-O2 -mdejagnu-cpu=power7" } */ +/* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } } */ + +/* Test that it does expand for memcmpsi instead of calling library on + P7 BE when length is less than 32 bytes. */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 31); +}
[patch-2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]
Hi, The "fctid" is supported on 64-bit Power processors and powerpc 476. It need a guard to check it. The patch fixes the issue. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: guard fctid on PPC64 and powerpc 476 fctid is supported on 64-bit Power processors and powerpc 476. It should be guarded by this condition. The patch fixes the issue. gcc/ PR target/112707 * config/rs6000/rs6000.h (TARGET_FCTID): Define. * config/rs6000/rs6000.md (lrintdi2): Add guard TARGET_FCTID. gcc/testsuite/ PR target/112707 * gcc.target/powerpc/pr112707.h: New. * gcc.target/powerpc/pr112707-2.c: New. * gcc.target/powerpc/pr112707-3.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 22595f6..497ae3d 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -467,6 +467,8 @@ extern int rs6000_vector_align[]; #define TARGET_FCFIDUS TARGET_POPCNTD #define TARGET_FCTIDUZ TARGET_POPCNTD #define TARGET_FCTIWUZ TARGET_POPCNTD +/* Enable fctid on ppc64 and powerpc476. */ +#define TARGET_FCTID (TARGET_POWERPC64 | TARGET_FPRND) #define TARGET_CTZ TARGET_MODULO #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64) #define TARGET_MADDLD TARGET_MODULO diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index d4337ce..4a5e63c 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6718,7 +6718,7 @@ (define_insn "lrintdi2" [(set (match_operand:DI 0 "gpc_reg_operand" "=d") (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")] UNSPEC_FCTID))] - "TARGET_HARD_FLOAT" + "TARGET_HARD_FLOAT && TARGET_FCTID" "fctid %0,%1" [(set_attr "type" "fp")]) diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-2.c b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c new file mode 100644 index 000..ae91913 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c @@ -0,0 +1,6 @@ +/* { dg-do compile { target { powerpc*-*-* && be } } } */ +/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */ +/* { dg-require-effective-target ilp32 } */ +/* { dg-final { scan-assembler-not {\mfctid\M} } } */ + +#include "pr112707.h" diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-3.c b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c new file mode 100644 index 000..e47ce20 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c @@ -0,0 +1,9 @@ +/* { dg-do compile { target { powerpc*-*-* && be } } } */ +/* { dg-options "-O2 -m32 -fno-math-errno -mdejagnu-cpu=476fp" } */ +/* { dg-require-effective-target ilp32 } */ + +/* powerpc 476fp has hard float enabled which is required by fctid */ + +#include "pr112707.h" + +/* { dg-final { scan-assembler-times {\mfctid\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707.h b/gcc/testsuite/gcc.target/powerpc/pr112707.h new file mode 100644 index 000..e427dc6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707.h @@ -0,0 +1,10 @@ +long long test1 (double a) +{ + return __builtin_llrint (a); +} + +long long test2 (float a) +{ + return __builtin_llrint (a); +} +
[patch-1, rs6000] enable fctiw on old archs [PR112707]
Hi, SImode in float register is supported on P7 above. It causes "fctiw" can be generated on old 32-bit processors as the output operand of fctiw insn is a SImode in float/double register. This patch fixes the problem by adding an expand and an insn pattern for fctiw. The output of new pattern is SFmode. When the target doesn't support SImode in float register, it calls the new pattern and convert the SFmode to SImode via stack. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: enable fctiw on old archs The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction, but the instruction can't be generated on such platforms as the insn is guard by TARGET_POPCNTD. The root cause is SImode in float register is supported from Power7. Actually implementation of "fctiw" only needs stfiwx which is supported by the old 320-bit processors. This patch enables "fctiw" expand for these processors. gcc/ PR target/112707 * config/rs6000/rs6000.md (UNSPEC_STFIWX_SF, UNSPEC_FCTIW_SF): New. (expand lrintsi2): New. (insn lrintsi2): Rename to... (lrintsi_internal): ...this, and remove guard TARGET_POPCNTD. (lrintsi_internal2): New. (stfiwx_sf): New. gcc/testsuite/ PR target/112707 * gcc.target/powerpc/pr112707-1.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index d4337ce42a9..1b207522ad5 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -90,6 +90,7 @@ (define_c_enum "unspec" UNSPEC_TLSTLS_PCREL UNSPEC_FIX_TRUNC_TF ; fadd, rounding towards zero UNSPEC_STFIWX + UNSPEC_STFIWX_SF UNSPEC_POPCNTB UNSPEC_FRES UNSPEC_SP_SET @@ -111,6 +112,7 @@ (define_c_enum "unspec" UNSPEC_PARITY UNSPEC_CMPB UNSPEC_FCTIW + UNSPEC_FCTIW_SF UNSPEC_FCTID UNSPEC_LFIWAX UNSPEC_LFIWZX @@ -6722,11 +6724,39 @@ (define_insn "lrintdi2" "fctid %0,%1" [(set_attr "type" "fp")]) -(define_insn "lrintsi2" +(define_expand "lrintsi2" [(set (match_operand:SI 0 "gpc_reg_operand" "=d") (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")] UNSPEC_FCTIW))] - "TARGET_HARD_FLOAT && TARGET_POPCNTD" + "TARGET_HARD_FLOAT && TARGET_STFIWX" +{ + /* For those old archs in which SImode can't be hold in float registers, + call lrintsi_internal2 to put the result in SFmode then + convert it via stack. */ + if (!TARGET_POPCNTD) +{ + rtx tmp = gen_reg_rtx (SFmode); + emit_insn (gen_lrintsi_internal2 (tmp, operands[1])); + rtx stack = rs6000_allocate_stack_temp (SImode, false, true); + emit_insn (gen_stfiwx_sf (stack, tmp)); + emit_move_insn (operands[0], stack); + DONE; +} +}) + +(define_insn "lrintsi_internal" + [(set (match_operand:SI 0 "gpc_reg_operand" "=d") + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")] + UNSPEC_FCTIW))] + "TARGET_HARD_FLOAT" + "fctiw %0,%1" + [(set_attr "type" "fp")]) + +(define_insn "lrintsi_internal2" + [(set (match_operand:SF 0 "gpc_reg_operand" "=d") + (unspec:SF [(match_operand:SFDF 1 "gpc_reg_operand" "")] + UNSPEC_FCTIW_SF))] + "TARGET_HARD_FLOAT" "fctiw %0,%1" [(set_attr "type" "fp")]) @@ -6801,6 +6831,14 @@ (define_insn "stfiwx" [(set_attr "type" "fpstore") (set_attr "isa" "*,p8v")]) +(define_insn "stfiwx_sf" + [(set (match_operand:SI 0 "memory_operand" "=Z") + (unspec:SI [(match_operand:SF 1 "gpc_reg_operand" "d")] + UNSPEC_STFIWX_SF))] + "TARGET_STFIWX" + "stfiwx %1,%y0" + [(set_attr "type" "fpstore")]) + ;; If we don't have a direct conversion to single precision, don't enable this ;; conversion for 32-bit without fast math, because we don't have the insn to ;; generate the fixup swizzle to avoid double rounding problems. diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c new file mode 100644 index 000..32f708c5402 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc*-*-* && be } } } */ +/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */ +/* { dg-require-effective-target ilp32 } */ +/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } } */ + +int test1 (double a) +{ + return __builtin_irint (a); +} + +int test2 (float a) +{ + return __builtin_irint (a); +}
[patch-1v2, rs6000] enable fctiw on old archs [PR112707]
Hi, SImode in float register is supported on P7 above. It causes "fctiw" can't be generated on old 32-bit processors as the output operand of fctiw insn is an SImode in float/double register. This patch fixes the problem by adding one expand and one insn pattern for fctiw. The output of new pattern is DImode. When the targets don't support SImode in float register, it calls the new insn pattern and convert the DImode to SImode via stack. Compared to last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638860.html the main change is to change the mode of output operand of the new insn from SFmode to DImode so that it can call stfiwx pattern directly. No need additional unspecs. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: enable fctiw on old archs The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction, but the instruction can't be generated on such platforms as the insn is guard by TARGET_POPCNTD. The root cause is SImode in float register is supported from Power7. Actually implementation of "fctiw" only needs stfiwx which is supported by the old 32-bit processors. This patch enables "fctiw" expand for these processors. gcc/ PR target/112707 * config/rs6000/rs6000.md (expand lrintsi2): New. (insn lrintsi2): Rename to... (*lrintsi): ...this. (lrintsi_di): New. gcc/testsuite/ PR target/112707 * gcc.target/powerpc/pr112707-1.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2a1b5ecfaee..dfb7f19c6ad 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6722,7 +6722,27 @@ (define_insn "lrintdi2" "fctid %0,%1" [(set_attr "type" "fp")]) -(define_insn "lrintsi2" +(define_expand "lrintsi2" + [(set (match_operand:SI 0 "gpc_reg_operand" "=d") + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")] + UNSPEC_FCTIW))] + "TARGET_HARD_FLOAT && TARGET_STFIWX" +{ + /* For those old archs in which SImode can't be hold in float registers, + call lrintsi_internal2 to put the result in SFmode then + convert it via stack. */ + if (!TARGET_POPCNTD) +{ + rtx tmp = gen_reg_rtx (DImode); + emit_insn (gen_lrintsi_di (tmp, operands[1])); + rtx stack = rs6000_allocate_stack_temp (SImode, false, true); + emit_insn (gen_stfiwx (stack, tmp)); + emit_move_insn (operands[0], stack); + DONE; +} +}) + +(define_insn "*lrintsi" [(set (match_operand:SI 0 "gpc_reg_operand" "=d") (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")] UNSPEC_FCTIW))] @@ -6730,6 +6750,14 @@ (define_insn "lrintsi2" "fctiw %0,%1" [(set_attr "type" "fp")]) +(define_insn "lrintsi_di" + [(set (match_operand:DI 0 "gpc_reg_operand" "=d") + (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")] + UNSPEC_FCTIW))] + "TARGET_HARD_FLOAT && !TARGET_POPCNTD" + "fctiw %0,%1" + [(set_attr "type" "fp")]) + (define_insn "btrunc2" [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa") (unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")] diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c new file mode 100644 index 000..cce6bd7f690 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */ +/* { dg-require-effective-target ilp32 } */ +/* { dg-skip-if "" { has_arch_ppc64 } } */ +/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } } */ + +int test1 (double a) +{ + return __builtin_irint (a); +} + +int test2 (float a) +{ + return __builtin_irint (a); +}
[patch-2v2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]
Hi, The "fctid" is supported on 64-bit Power processors and powerpc 476. It need a guard to check it. The patch fixes the issue. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638859.html the main change is to define TARGET_FCTID to POWERPC64 or PPC476. Also guard "lrintdi2" by TARGET_FCTID as it generates fctid. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: guard fctid on PPC64 and powerpc 476. fctid is supported on 64-bit Power processors and powerpc 476. It should be guarded by this condition. The patch fixes the issue. gcc/ PR target/112707 * config/rs6000/rs6000.h (TARGET_FCTID): Define. * config/rs6000/rs6000.md (lrintdi2): Add guard TARGET_FCTID. * (lrounddi2): Replace TARGET_FPRND with TARGET_FCTID. gcc/testsuite/ PR target/112707 * gcc.target/powerpc/pr112707.h: New. * gcc.target/powerpc/pr112707-2.c: New. * gcc.target/powerpc/pr112707-3.c: New. * gcc.target/powerpc/pr88558-p7.c: Remove fctid for ilp32 as it's now guarded by powerpc64. * gcc.target/powerpc/pr88558-p8.c: Likewise. * gfortran.dg/nint_p7.f90: Add powerpc64 target requirement as lrounddi2 is now guarded by powerpc64. patch.diff diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 22595f6ebd7..8c29ca68ccf 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -467,6 +467,8 @@ extern int rs6000_vector_align[]; #define TARGET_FCFIDUS TARGET_POPCNTD #define TARGET_FCTIDUZ TARGET_POPCNTD #define TARGET_FCTIWUZ TARGET_POPCNTD +/* Enable fctid on ppc64 and powerpc476. */ +#define TARGET_FCTID (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476) #define TARGET_CTZ TARGET_MODULO #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64) #define TARGET_MADDLD TARGET_MODULO diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2a1b5ecfaee..3be79d49dc0 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6718,7 +6718,7 @@ (define_insn "lrintdi2" [(set (match_operand:DI 0 "gpc_reg_operand" "=d") (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")] UNSPEC_FCTID))] - "TARGET_HARD_FLOAT" + "TARGET_HARD_FLOAT && TARGET_FCTID" "fctid %0,%1" [(set_attr "type" "fp")]) @@ -6784,7 +6784,7 @@ (define_expand "lrounddi2" (set (match_operand:DI 0 "gpc_reg_operand") (unspec:DI [(match_dup 2)] UNSPEC_FCTID))] - "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FPRND" + "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FCTID" { operands[2] = gen_reg_rtx (mode); }) diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-2.c b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c new file mode 100644 index 000..672e00691ea --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */ +/* { dg-require-effective-target ilp32 } */ +/* { dg-skip-if "" { has_arch_ppc64 } } */ +/* { dg-final { scan-assembler-not {\mfctid\M} } } */ + +/* powerpc 7450 doesn't support ppc64 (-m32 -mpowerpc64), so skips it. */ + +#include "pr112707.h" diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-3.c b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c new file mode 100644 index 000..924338fd390 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=476fp" } */ +/* { dg-require-effective-target ilp32 } */ + +/* powerpc 476fp has hard float enabled which is required by fctid */ + +#include "pr112707.h" + +/* { dg-final { scan-assembler-times {\mfctid\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707.h b/gcc/testsuite/gcc.target/powerpc/pr112707.h new file mode 100644 index 000..e427dc6a72e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707.h @@ -0,0 +1,10 @@ +long long test1 (double a) +{ + return __builtin_llrint (a); +} + +long long test2 (float a) +{ + return __builtin_llrint (a); +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c index 3932656c5fd..13d433c4bdb 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c +++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c @@ -6,7 +6,6 @@ #include "pr88558.h" /* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */ /* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {\mstfiwx\M} 2 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c b/gcc/t
Re: [patch-2v3, rs6000] Guard fctid on PowerPC64 and PowerPC476 [PR112707]
Hi, The "fctid" is supported on 64-bit Power processors and PowerPC476. It need a guard to check it. The patch fixes the issue. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639536.html the main change is to change the target requirement in pr88558*.c. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? ChangeLog rs6000: Guard fctid on PowerPC64 and PowerPC476. fctid is supported on 64-bit Power processors and powerpc 476. It should be guarded by this condition. The patch fixes the issue. gcc/ PR target/112707 * config/rs6000/rs6000.h (TARGET_FCTID): Define. * config/rs6000/rs6000.md (lrintdi2): Add guard TARGET_FCTID. * (lrounddi2): Replace TARGET_FPRND with TARGET_FCTID. gcc/testsuite/ PR target/112707 * gcc.target/powerpc/pr112707.h: New. * gcc.target/powerpc/pr112707-2.c: New. * gcc.target/powerpc/pr112707-3.c: New. * gcc.target/powerpc/pr88558-p7.c: Check fctid on ilp32 and has_arch_ppc64 as it's now guarded by powerpc64. * gcc.target/powerpc/pr88558-p8.c: Likewise. * gfortran.dg/nint_p7.f90: Add powerpc64 target requirement as lrounddi2 is now guarded by powerpc64. patch.diff diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 22595f6ebd7..8c29ca68ccf 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -467,6 +467,8 @@ extern int rs6000_vector_align[]; #define TARGET_FCFIDUS TARGET_POPCNTD #define TARGET_FCTIDUZ TARGET_POPCNTD #define TARGET_FCTIWUZ TARGET_POPCNTD +/* Enable fctid on ppc64 and powerpc476. */ +#define TARGET_FCTID (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476) #define TARGET_CTZ TARGET_MODULO #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64) #define TARGET_MADDLD TARGET_MODULO diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2a1b5ecfaee..3be79d49dc0 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6718,7 +6718,7 @@ (define_insn "lrintdi2" [(set (match_operand:DI 0 "gpc_reg_operand" "=d") (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")] UNSPEC_FCTID))] - "TARGET_HARD_FLOAT" + "TARGET_HARD_FLOAT && TARGET_FCTID" "fctid %0,%1" [(set_attr "type" "fp")]) @@ -6784,7 +6784,7 @@ (define_expand "lrounddi2" (set (match_operand:DI 0 "gpc_reg_operand") (unspec:DI [(match_dup 2)] UNSPEC_FCTID))] - "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FPRND" + "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FCTID" { operands[2] = gen_reg_rtx (mode); }) diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-2.c b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c new file mode 100644 index 000..672e00691ea --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */ +/* { dg-require-effective-target ilp32 } */ +/* { dg-skip-if "" { has_arch_ppc64 } } */ +/* { dg-final { scan-assembler-not {\mfctid\M} } } */ + +/* powerpc 7450 doesn't support ppc64 (-m32 -mpowerpc64), so skips it. */ + +#include "pr112707.h" diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-3.c b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c new file mode 100644 index 000..924338fd390 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=476fp" } */ +/* { dg-require-effective-target ilp32 } */ + +/* powerpc 476fp has hard float enabled which is required by fctid */ + +#include "pr112707.h" + +/* { dg-final { scan-assembler-times {\mfctid\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707.h b/gcc/testsuite/gcc.target/powerpc/pr112707.h new file mode 100644 index 000..e427dc6a72e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707.h @@ -0,0 +1,10 @@ +long long test1 (double a) +{ + return __builtin_llrint (a); +} + +long long test2 (float a) +{ + return __builtin_llrint (a); +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c index 3932656c5fd..2fa0b997e52 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c +++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c @@ -6,7 +6,7 @@ #include "pr88558.h" /* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target { ilp32 && has_arch_ppc64 } } } } */ /* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */ /* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {\mstfiwx\M} 2 { target lp64 } } } */ diff --git a/gcc/testsuite/g
[Patch, rs6000] Correct definition of macro of fixed point efficient unaligned
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a comprehensible name. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of macro of fixed point efficient unaligned Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to guard whether a platform is efficient on fixed point unaligned load/store. It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled from P8 and can be disabled by mno-vsx option. So the definition is wrong. This patch corrects the problem and define it by "!STRICT_ALIGNMENT" which is true on P7 BE and P8 above. gcc/ * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED): Rename to... (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT): ...this, set it to !STRICT_ALIGNMENT. * config/rs6000/rs6000-string.cc (select_block_compare_mode): Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT. (select_block_compare_mode): Likewise. (expand_block_compare_gpr): Likewise. (expand_block_compare): Likewise. (expand_strncmp_gpr_sequence): Likewise. gcc/testsuite/ * gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c: New. * gcc.target/powerpc/target_efficient_unaligned_fixedpoint-2.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 44a946cd453..d4030854b2a 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, else if (bytes == GET_MODE_SIZE (QImode)) return QImode; else if (bytes < GET_MODE_SIZE (SImode) - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT && offset >= GET_MODE_SIZE (SImode) - bytes) /* This matches the case were we have SImode and 3 bytes and offset >= 1 and permits us to move back one and overlap @@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, unwanted bytes off of the input. */ return SImode; else if (word_mode_ok && bytes < UNITS_PER_WORD - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT && offset >= UNITS_PER_WORD-bytes) /* Similarly, if we can use DImode it will get matched here and can do an overlapping read that ends at the end of the block. */ @@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ @@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[]) /* The code generated for p7 and older is not faster than glibc memcmp if alignment is small and length is not short, so bail out to avoid those conditions. */ - if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT && ((base_align == 1 && bytes > 16) || (base_align == 2 && bytes > 32))) return false; @@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT bytes_to_compare, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes_to_compare >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 326c45221e9..2f3a82942c1 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -483,9 +483,9 @@ extern int rs6000_vector_align[]; #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT) -/* This wants to be set for p8 and newer. On p7, overlapping unaligned - loads are slow. */ -#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX +/* Like TARGET_EFFICIENT_UNALIGNED_VSX, indicates if unaligned fixed point + loads/stores are efficient. */ +#define TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT (!STRICT_ALIGNMENT) /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present in power7, so conditionalize them on p8 features. TImode syncs need quad diff --git a/gcc/testsuite/gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c b/gcc/testsuite/gcc.target/powerpc/targ
[Patch, rs6000] Clean up pre-checking of expand_block_compare
Hi, This patch cleans up pre-checking of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up pre-checking of expand_block_compare gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Return false when it's optimized for size. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. gcc/testsuite/ * gcc.target/powerpc/memcmp_for_size.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index d4030854b2a..dff69e90d0c 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1946,6 +1946,15 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { + gcc_assert (TARGET_POPCNTD); + + if (optimize_insn_for_size_p ()) +return false; + + /* Allow this param to shut off all expansion. */ + if (rs6000_block_compare_inline_limit == 0) +return false; + rtx target = operands[0]; rtx orig_src1 = operands[1]; rtx orig_src2 = operands[2]; @@ -1959,23 +1968,9 @@ expand_block_compare (rtx operands[]) if (TARGET_32BIT && TARGET_POWERPC64) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - - /* Allow this param to shut off all expansion. */ - if (rs6000_block_compare_inline_limit == 0) -return false; - - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2 -return false; - - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ +if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2023,14 +2018,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) -return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c new file mode 100644 index 000..c7e853ad593 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } } */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 4); +}
Re: [PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]
Hi Kewen, Thanks for your review comments. Just one question on following comment. 在 2023/11/7 10:40, Kewen.Lin 写道: > Nit: has_arch_pwr8 would make it un-tested on Power7 default env, I'd prefer > to remove this "has_arch_pwr8" and append "-mdejagnu-cpu=power8" to > dg-options. My original propose is to test the case on p8/p9/p10. Each of them generate different instruction sequence. If it's assigned "-mdejagnu-cpu=power8", only p8 instruction sequence is generated. Does it lost the coverage? Thanks Gui Haochen
[PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449]
Hi, This patch enables vector mode for by pieces equality compare. It adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare relies both move and compare instructions, so both macro are changed. As the vector load/store might be unaligned, the 16-byte move and compare are only enabled when VSX and EFFICIENT_UNALIGNED_VSX are both enabled. This patch enables 16-byte by pieces move. As the vector mode is not enabled for by pieces move, TImode is used for the move. It caused 2 regression cases. The root cause is that now 16-byte length array can be constructed by one load instruction and not be put into LC0 so that SRA optimization will not be taken. Compared to previous version, the main change is to modify the guard of expand pattern and compiling options of the test case. Also the fix for two regression cases caused by 16-byte move enablement is moved to this patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable vector mode for by pieces equality compare This patch adds a new expand pattern - cbranchv16qi4 to enable vector mode by pieces equality compare on rs6000. The macro MOVE_MAX_PIECES (COMPARE_MAX_PIECES) is set to 16 bytes when VSX and EFFICIENT_UNALIGNED_VSX is enabled, otherwise keeps unchanged. The macro STORE_MAX_PIECES is set to the same value as MOVE_MAX_PIECES by default, so now it's explicitly defined and keeps unchanged. gcc/ PR target/111449 * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern. * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn sequence for V16QImode equality compare. * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. (STORE_MAX_PIECES): Define. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449-1.c: New. * gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit powerpc. * gcc.dg/tree-ssa/sra-18.c: Likewise. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index e8a596fb7e9..a1423c76451 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2605,6 +2605,48 @@ (define_insn "altivec_vupklpx" } [(set_attr "type" "vecperm")]) +/* The cbranch_optabs doesn't allow FAIL, so old cpus which are + inefficient on unaligned vsx are disabled as the cost is high + for unaligned load/store. */ +(define_expand "cbranchv16qi4" + [(use (match_operator 0 "equality_operator" + [(match_operand:V16QI 1 "reg_or_mem_operand") +(match_operand:V16QI 2 "reg_or_mem_operand")])) + (use (match_operand 3))] + "VECTOR_MEM_VSX_P (V16QImode) + && TARGET_EFFICIENT_UNALIGNED_VSX" +{ + /* Use direct move for P8 LE to skip double-word swap, as the byte + order doesn't matter for equality compare. If any operands are + altivec indexed or indirect operands, the load can be implemented + directly by altivec aligned load instruction and swap is no + need. */ + if (!TARGET_P9_VECTOR + && !BYTES_BIG_ENDIAN + && MEM_P (operands[1]) + && !altivec_indexed_or_indirect_operand (operands[1], V16QImode) + && MEM_P (operands[2]) + && !altivec_indexed_or_indirect_operand (operands[2], V16QImode)) +{ + rtx reg_op1 = gen_reg_rtx (V16QImode); + rtx reg_op2 = gen_reg_rtx (V16QImode); + rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode); + rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode); + operands[1] = reg_op1; + operands[2] = reg_op2; +} + else +{ + operands[1] = force_reg (V16QImode, operands[1]); + operands[2] = force_reg (V16QImode, operands[2]); +} + + rtx_code code = GET_CODE (operands[0]); + operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]); + rs6000_emit_cbranch (V16QImode, operands); + DONE; +}) + ;; Compare vectors producing a vector result and a predicate, setting CR6 to ;; indicate a combined status (define_insn "altivec_vcmpequ_p" diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index cc24dd5301e..10279052636 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode) else emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b)); } + else if (mode == V16QImode) + { + gcc_assert (code == EQ || code == NE); + + rtx result_vector = gen_reg_rtx (V16QImode); + rtx cc_bit = gen_reg_rtx (SImode); + emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1)); + emit_insn (gen_cr6_test_for_lt (cc_bit)); + emit_insn (gen_rtx_SET (compare_result, + gen_rtx_COMPARE (comp_mode, cc_bit, +
[PATCH-3v3, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]
Hi, Originally 16-byte memory to memory is expanded via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now it's done by 16-byte by pieces move and the optimization is lost. This patch adds an insn_and_split pattern to retake the optimization. Compared to the previous version, the main change is to move fix for two regression cases to former patch and change the condition of pattern. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Fix regression cases caused 16-byte by pieces move The previous patch enables 16-byte by pieces move. Originally 16-byte move is implemented via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now 16-byte move is implemented via by pieces move and finally split to two DImode load/store. This patch creates an insn_and_split pattern to retake the optimization. gcc/ PR target/111449 * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449-2.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f3b40229094..3f71e96dc6b 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -414,6 +414,29 @@ (define_mode_attr VM3_char [(V2DI "d") ;; VSX moves +;; TImode memory to memory move optimization on LE with p8vector +(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti" + [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z") + (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))] + "!BYTES_BIG_ENDIAN + && TARGET_VSX + && !TARGET_P9_VECTOR + && !MEM_VOLATILE_P (operands[0]) + && !MEM_VOLATILE_P (operands[1]) + && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx src = adjust_address (operands[1], V2DImode, 0); + emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src)); + rtx dest = adjust_address (operands[0], V2DImode, 0); + emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp)); + DONE; +} + [(set_attr "length" "16")]) + ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. (define_insn_and_split "*vsx_le_perm_load_" diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c new file mode 100644 index 000..7003bdc0208 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { has_arch_pwr8 } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mvsx -O2" } */ + +/* Ensure 16-byte by pieces move is enabled. */ + +void move1 (void *s1, void *s2) +{ + __builtin_memcpy (s1, s2, 16); +} + +void move2 (void *s1) +{ + __builtin_memcpy (s1, "0123456789012345", 16); +} + +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */
[PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]
Hi, This patch modifies expand_builtin_return and make it call expand_misaligned_mem_ref to load unaligned memory. The memory reference pointed by void* pointer might be unaligned, so expanding it with unaligned move optabs is safe. The new test case illustrates the problem. rs6000 doesn't have unaligned vector load instruction with VSX disabled. When calling builtin_return, it shouldn't load the memory to vector register by unaligned load instruction directly. It should store it to an on stack variable by extract_bit_field then load to return register from stack by aligned load instruction. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog expand: Call misaligned memory reference in expand_builtin_return expand_builtin_return loads memory to return registers. The memory might be unaligned compared to the mode of the registers. So it should be expanded by unaligned move optabs if the memory reference is unaligned. gcc/ PR target/112417 * builtins.cc (expand_builtin_return): Call expand_misaligned_mem_ref for loading unaligned memory reference. * builtins.h (expand_misaligned_mem_ref): Declare. * expr.cc (expand_misaligned_mem_ref): No longer static. gcc/testsuite/ PR target/112417 * gcc.target/powerpc/pr112417.c: New. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index cb90bd03b3e..b879eb88b7c 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -1816,7 +1816,12 @@ expand_builtin_return (rtx result) if (size % align != 0) size = CEIL (size, align) * align; reg = gen_rtx_REG (mode, INCOMING_REGNO (regno)); - emit_move_insn (reg, adjust_address (result, mode, size)); + rtx tmp = adjust_address (result, mode, size); + unsigned int align = MEM_ALIGN (tmp); + if (align < GET_MODE_ALIGNMENT (mode)) + tmp = expand_misaligned_mem_ref (tmp, mode, 1, align, + NULL, NULL); + emit_move_insn (reg, tmp); push_to_sequence (call_fusage); emit_use (reg); diff --git a/gcc/builtins.h b/gcc/builtins.h index 88a26d70cd5..a3d7954ee6e 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -157,5 +157,7 @@ extern internal_fn replacement_internal_fn (gcall *); extern bool builtin_with_linkage_p (tree); extern int type_to_class (tree); +extern rtx expand_misaligned_mem_ref (rtx, machine_mode, int, unsigned int, + rtx, rtx *); #endif /* GCC_BUILTINS_H */ diff --git a/gcc/expr.cc b/gcc/expr.cc index ed4dbb13d89..b0adb35a095 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -9156,7 +9156,7 @@ expand_cond_expr_using_cmove (tree treeop0 ATTRIBUTE_UNUSED, If the result can be stored at TARGET, and ALT_RTL is non-NULL, then *ALT_RTL is set to TARGET (before legitimziation). */ -static rtx +rtx expand_misaligned_mem_ref (rtx temp, machine_mode mode, int unsignedp, unsigned int align, rtx target, rtx *alt_rtl) { diff --git a/gcc/testsuite/gcc.target/powerpc/pr112417.c b/gcc/testsuite/gcc.target/powerpc/pr112417.c new file mode 100644 index 000..ef82fc82033 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112417.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { has_arch_pwr7 } } } */ +/* { dg-options "-mno-vsx -maltivec -O2" } */ + +void * foo (void * p) +{ + if (p) +__builtin_return (p); +} + +/* Ensure that unaligned load is generated via stack load/store. */ +/* { dg-final { scan-assembler {\mstw\M} { target { ! has_arch_ppc64 } } } } */ +/* { dg-final { scan-assembler {\mstd\M} { target has_arch_ppc64 } } } */
Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]
Hi Richard, Thanks so much for your comments. 在 2023/11/9 19:41, Richard Biener 写道: > I'm not sure if the testcase is valid though? > > @defbuiltin{{void} __builtin_return (void *@var{result})} > This built-in function returns the value described by @var{result} from > the containing function. You should specify, for @var{result}, a value > returned by @code{__builtin_apply}. > @enddefbuiltin > > I don't see __builtin_apply being used here? The prototype of the test case is from "__objc_block_forward" in libobjc/sendmsg.c. void *args, *res; args = __builtin_apply_args (); res = __objc_forward (rcv, op, args); if (res) __builtin_return (res); else ... The __builtin_apply_args puts the return values on stack by the alignment. But the forward function can do anything and return a void* pointer. IMHO the alignment might be broken. So I just simplified it to use a void* pointer as the input argument of "__builtin_return" and skip "__builtin_apply_args". Thanks Gui Haochen
[PATCH-3v4, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]
Hi, Originally 16-byte memory to memory is expanded via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now it's done by 16-byte by pieces move and the optimization is lost. This patch adds an insn_and_split pattern to retake the optimization. Compared to the previous version, the main change is to remove volatile memory operands check from the insn condition as it's no need. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Fix regression cases caused 16-byte by pieces move The previous patch enables 16-byte by pieces move. Originally 16-byte move is implemented via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now 16-byte move is implemented via by pieces move and finally split to two DI load/store. This patch creates an insn_and_split pattern to retake the optimization. gcc/ PR target/111449 * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449-2.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f3b40229094..26fa32829af 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d") ;; VSX moves +;; TImode memory to memory move optimization on LE with p8vector +(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti" + [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z") + (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))] + "!BYTES_BIG_ENDIAN + && TARGET_VSX + && !TARGET_P9_VECTOR + && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx src = adjust_address (operands[1], V2DImode, 0); + emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src)); + rtx dest = adjust_address (operands[0], V2DImode, 0); + emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp)); + DONE; +} + [(set_attr "length" "16")]) + ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. (define_insn_and_split "*vsx_le_perm_load_" diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c new file mode 100644 index 000..7003bdc0208 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { has_arch_pwr8 } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mvsx -O2" } */ + +/* Ensure 16-byte by pieces move is enabled. */ + +void move1 (void *s1, void *s2) +{ + __builtin_memcpy (s1, s2, 16); +} + +void move2 (void *s1) +{ + __builtin_memcpy (s1, "0123456789012345", 16); +} + +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */
Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]
Hi Richard, 在 2023/11/10 17:06, Richard Biener 写道: > On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI wrote: >> >> Hi Richard, >> Thanks so much for your comments. >> >> 在 2023/11/9 19:41, Richard Biener 写道: >>> I'm not sure if the testcase is valid though? >>> >>> @defbuiltin{{void} __builtin_return (void *@var{result})} >>> This built-in function returns the value described by @var{result} from >>> the containing function. You should specify, for @var{result}, a value >>> returned by @code{__builtin_apply}. >>> @enddefbuiltin >>> >>> I don't see __builtin_apply being used here? >> >> The prototype of the test case is from "__objc_block_forward" in >> libobjc/sendmsg.c. >> >> void *args, *res; >> >> args = __builtin_apply_args (); >> res = __objc_forward (rcv, op, args); >> if (res) >> __builtin_return (res); >> else >> ... >> >> The __builtin_apply_args puts the return values on stack by the alignment. >> But the forward function can do anything and return a void* pointer. >> IMHO the alignment might be broken. So I just simplified it to use a >> void* pointer as the input argument of "__builtin_return" and skip >> "__builtin_apply_args". > > But doesn't __objc_forward then break the contract between > __builtin_apply_args and __builtin_return? > > That said, __builtin_return is a very special function, it's not supposed > to deal with what you are fixing. At least I think so. > > IMHO the bug is in __objc_block_forward. If so, can we document that the memory objects pointed by input argument of __builtin_return have to be aligned? Then we can force the alignment in __builtin_return. The customer function can do anything if gcc doesn't state that. Thanks Gui Haochen > > Richard. > >> >> Thanks >> Gui Haochen
Re: Fwd: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]
Sorry, forgot to cc gcc-patches. 在 2023/11/13 16:05, HAO CHEN GUI 写道: > Andrew, > Could you kindly inform us what's the functionality of __objc_forward? > Does it change the memory content pointed by args? Thanks a lot. > > Thanks > Gui Haochen > > > libobjc/sendmsg.c. > >void *args, *res; > >args = __builtin_apply_args (); >res = __objc_forward (rcv, op, args); >if (res) > __builtin_return (res); >else > ... > > 转发的消息 > 主题: Re: [PATCH, expand] Call misaligned memory reference in > expand_builtin_return [PR112417] > 日期: Fri, 10 Nov 2023 14:39:02 +0100 > From: Richard Biener > 收件人: HAO CHEN GUI > 抄送: gcc-patches , Kewen.Lin > > On Fri, Nov 10, 2023 at 11:10 AM HAO CHEN GUI wrote: >> >> Hi Richard, >> >> 在 2023/11/10 17:06, Richard Biener 写道: >>> On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI wrote: >>>> >>>> Hi Richard, >>>> Thanks so much for your comments. >>>> >>>> 在 2023/11/9 19:41, Richard Biener 写道: >>>>> I'm not sure if the testcase is valid though? >>>>> >>>>> @defbuiltin{{void} __builtin_return (void *@var{result})} >>>>> This built-in function returns the value described by @var{result} from >>>>> the containing function. You should specify, for @var{result}, a value >>>>> returned by @code{__builtin_apply}. >>>>> @enddefbuiltin >>>>> >>>>> I don't see __builtin_apply being used here? >>>> >>>> The prototype of the test case is from "__objc_block_forward" in >>>> libobjc/sendmsg.c. >>>> >>>> void *args, *res; >>>> >>>> args = __builtin_apply_args (); >>>> res = __objc_forward (rcv, op, args); >>>> if (res) >>>> __builtin_return (res); >>>> else >>>> ... >>>> >>>> The __builtin_apply_args puts the return values on stack by the alignment. >>>> But the forward function can do anything and return a void* pointer. >>>> IMHO the alignment might be broken. So I just simplified it to use a >>>> void* pointer as the input argument of "__builtin_return" and skip >>>> "__builtin_apply_args". >>> >>> But doesn't __objc_forward then break the contract between >>> __builtin_apply_args and __builtin_return? >>> >>> That said, __builtin_return is a very special function, it's not supposed >>> to deal with what you are fixing. At least I think so. >>> >>> IMHO the bug is in __objc_block_forward. >> >> If so, can we document that the memory objects pointed by input argument of >> __builtin_return have to be aligned? Then we can force the alignment in >> __builtin_return. The customer function can do anything if gcc doesn't state >> that. > > I don't think they have to be aligned - they have to adhere to the ABI > which __builtin_apply_args ensures. But others might know more details > here. > >> Thanks >> Gui Haochen >> >>> >>> Richard. >>> >>>> >>>> Thanks >>>> Gui Haochen
[PATCH] Clean up
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Remove unnecessary mov_optab checks. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog Clean up by_pieces_ninsns The by pieces compare can be implemented by overlapped operations. So it should be taken into consideration when doing the adjustment for overlap operations. The mode returned from widest_fixed_size_mode_for_size is already checked with mov_optab in by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size. So there is no need to check mov_optab again in by_pieces_ninsns. The patch fixes these issues. gcc/ * expr.cc (by_pieces_ninsns): Include by pieces compare when do the adjustment for overlap operations. Remove unnecessary mov_optab check. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 3e2a678710d..7cb2c935177 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1090,18 +1090,15 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, unsigned HOST_WIDE_INT n_insns = 0; fixed_size_mode mode; - if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES) + if (targetm.overlap_op_by_pieces_p ()) { /* NB: Round up L and ALIGN to the widest integer mode for MAX_SIZE. */ mode = widest_fixed_size_mode_for_size (max_size, op); - if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) - { - unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); - if (up > l) - l = up; - align = GET_MODE_ALIGNMENT (mode); - } + unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); + if (up > l) + l = up; + align = GET_MODE_ALIGNMENT (mode); } align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align); @@ -1109,12 +1106,10 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, while (max_size > 1 && l > 0) { mode = widest_fixed_size_mode_for_size (max_size, op); - enum insn_code icode; unsigned int modesize = GET_MODE_SIZE (mode); - icode = optab_handler (mov_optab, mode); - if (icode != CODE_FOR_nothing && align >= GET_MODE_ALIGNMENT (mode)) + if (align >= GET_MODE_ALIGNMENT (mode)) { unsigned HOST_WIDE_INT n_pieces = l / modesize; l %= modesize;
[PATCH] Clean up by_pieces_ninsns
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Remove unnecessary mov_optab checks. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog Clean up by_pieces_ninsns The by pieces compare can be implemented by overlapped operations. So it should be taken into consideration when doing the adjustment for overlap operations. The mode returned from widest_fixed_size_mode_for_size is already checked with mov_optab in by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size. So there is no need to check mov_optab again in by_pieces_ninsns. The patch fixes these issues. gcc/ * expr.cc (by_pieces_ninsns): Include by pieces compare when do the adjustment for overlap operations. Remove unnecessary mov_optab check. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 3e2a678710d..7cb2c935177 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1090,18 +1090,15 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, unsigned HOST_WIDE_INT n_insns = 0; fixed_size_mode mode; - if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES) + if (targetm.overlap_op_by_pieces_p ()) { /* NB: Round up L and ALIGN to the widest integer mode for MAX_SIZE. */ mode = widest_fixed_size_mode_for_size (max_size, op); - if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) - { - unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); - if (up > l) - l = up; - align = GET_MODE_ALIGNMENT (mode); - } + unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); + if (up > l) + l = up; + align = GET_MODE_ALIGNMENT (mode); } align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align); @@ -1109,12 +1106,10 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, while (max_size > 1 && l > 0) { mode = widest_fixed_size_mode_for_size (max_size, op); - enum insn_code icode; unsigned int modesize = GET_MODE_SIZE (mode); - icode = optab_handler (mov_optab, mode); - if (icode != CODE_FOR_nothing && align >= GET_MODE_ALIGNMENT (mode)) + if (align >= GET_MODE_ALIGNMENT (mode)) { unsigned HOST_WIDE_INT n_pieces = l / modesize; l %= modesize;
[PATCHv2] Clean up by_pieces_ninsns
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Replace unnecessary mov_optab checks with gcc assertions. Compared to last version, the main change is to replace unnecessary mov_optab checks with gcc assertions and fix the indentation. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog Clean up by_pieces_ninsns The by pieces compare can be implemented by overlapped operations. So it should be taken into consideration when doing the adjustment for overlap operations. The mode returned from widest_fixed_size_mode_for_size is already checked with mov_optab in by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size. So it is no need to check mov_optab again in by_pieces_ninsns. The patch fixes these issues. gcc/ * expr.cc (by_pieces_ninsns): Include by pieces compare when do the adjustment for overlap operations. Replace mov_optab checks with gcc assertions. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 556bcf7ef59..ffd18fe43cc 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1090,18 +1090,16 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, unsigned HOST_WIDE_INT n_insns = 0; fixed_size_mode mode; - if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES) + if (targetm.overlap_op_by_pieces_p ()) { /* NB: Round up L and ALIGN to the widest integer mode for MAX_SIZE. */ mode = widest_fixed_size_mode_for_size (max_size, op); - if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) - { - unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); - if (up > l) - l = up; - align = GET_MODE_ALIGNMENT (mode); - } + gcc_assert (optab_handler (mov_optab, mode) != CODE_FOR_nothing); + unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); + if (up > l) + l = up; + align = GET_MODE_ALIGNMENT (mode); } align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align); @@ -1109,12 +1107,11 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, while (max_size > 1 && l > 0) { mode = widest_fixed_size_mode_for_size (max_size, op); - enum insn_code icode; + gcc_assert (optab_handler (mov_optab, mode) != CODE_FOR_nothing); unsigned int modesize = GET_MODE_SIZE (mode); - icode = optab_handler (mov_optab, mode); - if (icode != CODE_FOR_nothing && align >= GET_MODE_ALIGNMENT (mode)) + if (align >= GET_MODE_ALIGNMENT (mode)) { unsigned HOST_WIDE_INT n_pieces = l / modesize; l %= modesize;
[PATCH] Expand: Pass down equality only flag to cmpmem expand
Hi, This patch passes down the equality only flags from emit_block_cmp_hints to cmpmem optab so that the target specific expand can generate optimized insns for equality only compare. Targets (e.g. rs6000) can generate more efficient insn sequence if the block compare is equality only. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog Expand: Pass down equality only flag to cmpmem expand Targets (e.g. rs6000) can generate more efficient insn sequence if the block compare is equality only. This patch passes down the equality only flags from emit_block_cmp_hints to cmpmem optab so that the target specific expand can generate optimized insns for equality only compare. gcc/ * expr.cc (expand_cmpstrn_or_cmpmem): Rename to... (expand_cmpstrn): ...this. (expand_cmpmem): New function. Pass down equality only flag to cmpmem expand. (emit_block_cmp_via_cmpmem): Add an argument for equality only flag and call expand_cmpmem instead of expand_cmpstrn_or_cmpmem. (emit_block_cmp_hints): Call emit_block_cmp_via_cmpmem with equality only flag. * expr.h (expand_cmpstrn, expand_cmpmem): Declare. * builtins.cc (expand_builtin_strcmp, expand_builtin_strncmp): Call expand_cmpstrn instead of expand_cmpstrn_or_cmpmem. * config/i386/i386.md (cmpmemsi): Add the sixth operand for equality only flag. * config/rs6000/rs6000.md (cmpmemsi): Likewise. * config/s390/s390.md (cmpmemsi): Likewise. * doc/md.texi (cmpmem): Modify the document and add an operand for equality only flag. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 5ece0d23eb9..c2dbc25433d 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -4819,7 +4819,7 @@ expand_builtin_strcmp (tree exp, ATTRIBUTE_UNUSED rtx target) if (len && !TREE_SIDE_EFFECTS (len)) { arg3_rtx = expand_normal (len); - result = expand_cmpstrn_or_cmpmem + result = expand_cmpstrn (cmpstrn_icode, target, arg1_rtx, arg2_rtx, TREE_TYPE (len), arg3_rtx, MIN (arg1_align, arg2_align)); } @@ -4929,9 +4929,9 @@ expand_builtin_strncmp (tree exp, ATTRIBUTE_UNUSED rtx target, rtx arg1_rtx = get_memory_rtx (arg1, len); rtx arg2_rtx = get_memory_rtx (arg2, len); rtx arg3_rtx = expand_normal (len); - result = expand_cmpstrn_or_cmpmem (cmpstrn_icode, target, arg1_rtx, -arg2_rtx, TREE_TYPE (len), arg3_rtx, -MIN (arg1_align, arg2_align)); + result = expand_cmpstrn (cmpstrn_icode, target, arg1_rtx, arg2_rtx, + TREE_TYPE (len), arg3_rtx, + MIN (arg1_align, arg2_align)); tree fndecl = get_callee_fndecl (exp); if (result) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 1b5a794b9e5..775cba5d93d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -23195,7 +23195,8 @@ (define_expand "cmpmemsi" (compare:SI (match_operand:BLK 1 "memory_operand" "") (match_operand:BLK 2 "memory_operand" "") ) ) (use (match_operand 3 "general_operand")) - (use (match_operand 4 "immediate_operand"))] + (use (match_operand 4 "immediate_operand")) + (use (match_operand 5 ""))] "" { if (ix86_expand_cmpstrn_or_cmpmem (operands[0], operands[1], diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2a1b5ecfaee..e66330f876e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -10097,7 +10097,8 @@ (define_expand "cmpmemsi" (compare:SI (match_operand:BLK 1) (match_operand:BLK 2))) (use (match_operand:SI 3)) - (use (match_operand:SI 4))])] + (use (match_operand:SI 4)) + (use (match_operand:SI 5))])] "TARGET_POPCNTD" { if (expand_block_compare (operands)) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 4bdb679daf2..506e79fb035 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3790,7 +3790,8 @@ (define_expand "cmpmemsi" (compare:SI (match_operand:BLK 1 "memory_operand" "") (match_operand:BLK 2 "memory_operand" "") ) ) (use (match_operand:SI 3 "general_operand" "")) - (use (match_operand:SI 4 "" ""))] + (use (match_operand:SI 4 "" "")) + (use (match_operand:SI 5 "" ""))] "" { if (s390_expand_cmpmem (operands[0], operands[1], diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index e01cdcbe22c..06955cd7e78 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -6992,14 +6992,19 @@ result of the comparison. @cindex @code{cmpmem@var{m}} instruction pattern @item @samp{cmpmem@var{m}} -Block compare instruction, with five operands like the operands -of @samp{cmpstr@var{m}}. The two memory blocks
[PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
Hi, This patch enables vector compare for 16-byte memory equality compare. The 16-byte memory equality compare can be efficiently implemented by instruction "vcmpequb." It reduces one branch and one compare compared with two 8-byte compare sequence. 16-byte vector compare is not enabled on 32bit sub-targets as TImode hasn't been supported well on 32bit sub-targets. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Enable vector compare for 16-byte memory equality compare gcc/ PR target/111449 * config/rs6000/altivec.md (cbranchti4): New expand pattern. * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn sequence for TImode vector equality compare. * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. (COMPARE_MAX_PIECES): Define. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449.c: New. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index e8a596fb7e9..99264235cbe 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2605,6 +2605,24 @@ (define_insn "altivec_vupklpx" } [(set_attr "type" "vecperm")]) +(define_expand "cbranchti4" + [(use (match_operator 0 "equality_operator" + [(match_operand:TI 1 "memory_operand") +(match_operand:TI 2 "memory_operand")])) + (use (match_operand 3))] + "VECTOR_UNIT_ALTIVEC_P (V16QImode)" +{ + rtx op1 = simplify_subreg (V16QImode, operands[1], TImode, 0); + rtx op2 = simplify_subreg (V16QImode, operands[2], TImode, 0); + operands[1] = force_reg (V16QImode, op1); + operands[2] = force_reg (V16QImode, op2); + rtx_code code = GET_CODE (operands[0]); + operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], + operands[2]); + rs6000_emit_cbranch (TImode, operands); + DONE; +}) + ;; Compare vectors producing a vector result and a predicate, setting CR6 to ;; indicate a combined status (define_insn "altivec_vcmpequ_p" diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index efe9adce1f8..c6b935a64e7 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode) else emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b)); } + else if (mode == TImode) + { + gcc_assert (code == EQ || code == NE); + + rtx result_vector = gen_reg_rtx (V16QImode); + compare_result = gen_rtx_REG (CCmode, CR6_REGNO); + emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1)); + code = (code == NE) ? GE : LT; + } else emit_insn (gen_rtx_SET (compare_result, gen_rtx_COMPARE (comp_mode, op0, op1))); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..dc33bca0802 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1730,6 +1730,8 @@ typedef struct rs6000_args in one reasonably fast instruction. */ #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) #define MAX_MOVE_MAX 8 +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) /* Nonzero if access to memory by bytes is no faster than for words. Also nonzero if doing byte operations (specifically shifts) in registers diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c new file mode 100644 index 000..ab9583f47bb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-maltivec -O2" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +/* Ensure vector comparison is used for 16-byte memory equality compare. */ + +int compare (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 16) == 0; +} + +/* { dg-final { scan-assembler-times {\mvcmpequb\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */
Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
Hi Kewen, 在 2023/9/18 15:34, Kewen.Lin 写道: > hanks for checking! So for P7, this patch looks neutral, but for P8 and > later, it may cause some few differences in code gen. I'm curious that how > many total object files and different object files were checked and found > on P8? P8 with -O2, following object files are different. 507.cactuBSSN_r datestamp.o 511.povray_r colutils.o 521.wrf_r module_cu_kfeta.fppized.o 526.blender_r particle_edit.o 526.blender_r glutil.o 526.blender_r displist.o 526.blender_r CCGSubSurf.o P8 with -O3, following object files are different. 502.gcc_r ifcvt.o 502.gcc_r rtlanal.o 548.exchange2_r exchange2.fppized.o 507.cactuBSSN_r datestamp.o 511.povray_r colutils.o 521.wrf_r module_bc.fppized.o 521.wrf_r module_cu_kfeta.fppized.o 526.blender_r particle_edit.o 526.blender_r displist.o 526.blender_r CCGSubSurf.o 526.blender_r sketch.o > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html > I also wonder if it's easy to reduce some of them further as small test cases. > > Since xxlor is better than fmr at least on Power10, could you also evaluate > the affected bmks on P10 (even P8/P9) to ensure no performance degradation? There is no performance recession on P10/P9/P8. The detail data is listed on internal issue. Thanks Gui Haochen
[PATCH-2v3, rs6000] Implement 32bit inline lrint [PR88558]
Hi, This patch implements 32bit inline lrint by "fctiw". It depends on the patch1 to do SImode move from FP registers on P7. Compared to last version, the main change is to add some test cases. https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629187.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: support 32bit inline lrint gcc/ PR target/88558 * config/rs6000/rs6000.md (lrintdi2): Remove TARGET_FPRND from insn condition. (lrintsi2): New insn pattern for 32bit lrint. gcc/testsuite/ PR target/106769 * gcc.target/powerpc/pr88558.h: New. * gcc.target/powerpc/pr88558-p7.c: New. * gcc.target/powerpc/pr88558-p8.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index ac5d29a2cf8..a41898e0e08 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6655,10 +6655,18 @@ (define_insn "lrintdi2" [(set (match_operand:DI 0 "gpc_reg_operand" "=d") (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")] UNSPEC_FCTID))] - "TARGET_HARD_FLOAT && TARGET_FPRND" + "TARGET_HARD_FLOAT" "fctid %0,%1" [(set_attr "type" "fp")]) +(define_insn "lrintsi2" + [(set (match_operand:SI 0 "gpc_reg_operand" "=d") + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")] + UNSPEC_FCTIW))] + "TARGET_HARD_FLOAT && TARGET_POPCNTD" + "fctiw %0,%1" + [(set_attr "type" "fp")]) + (define_insn "btrunc2" [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa") (unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")] diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c new file mode 100644 index 000..3932656c5fd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power7" } */ + +/* -fno-math-errno is required to make {i,l,ll}rint{,f} inlined */ + +#include "pr88558.h" + +/* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mstfiwx\M} 4 { target ilp32 } } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c b/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c new file mode 100644 index 000..1afc8fd4f0d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power8" } */ + +/* -fno-math-errno is required to make {i,l,ll}rint{,f} inlined */ + +#include "pr88558.h" + +/* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 2 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 4 { target ilp32 } } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558.h b/gcc/testsuite/gcc.target/powerpc/pr88558.h new file mode 100644 index 000..9c604faadd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr88558.h @@ -0,0 +1,29 @@ +long int test1 (double a) +{ + return __builtin_lrint (a); +} + +long long test2 (double a) +{ + return __builtin_llrint (a); +} + +int test3 (double a) +{ + return __builtin_irint (a); +} + +long int test4 (float a) +{ + return __builtin_lrintf (a); +} + +long long test5 (float a) +{ + return __builtin_llrintf (a); +} + +int test6 (float a) +{ + return __builtin_irintf (a); +}
Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
Kewen and Richard, Thanks for your comments. Please let me clarify it. 在 2023/9/27 19:10, Richard Sandiford 写道: > Yeah, I agree there doesn't seem to be a good reason to exclude vectors. > Sorry to dive straight into details, but maybe we should have something > called bitwise_mode_for_size that tries to use integer modes where possible, > but falls back to vector modes otherwise. That mode could then be used > for copying, storing, bitwise ops, and equality comparisons (if there > is appropriate optabs support). The vector mode is not supported for compare_by_pieces and move_by_pieces. But it is supported for set_by_pieces and clear_by_pieces. The help function widest_fixed_size_mode_for_size returns vector mode when qi_vector is set to true. static fixed_size_mode widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) I tried to enable qi_vector for compare_by_pieces. It can pick up a vector mode (eg. V16QImode) and works on some cases. But it fails on a constant string case. int compare (const char* s1) { return __builtin_memcmp_eq (s1, "__GCC_HAVE_DWARF2_CFI_ASM", 16); } As the second op is a constant string, it calls builtin_memcpy_read_str to build the string. Unfortunately, the inner function doesn't support vector mode. /* The by-pieces infrastructure does not try to pick a vector mode for memcpy expansion. */ return c_readstr (rep + offset, as_a (mode), /*nul_terminated=*/false); Seems by-pieces infrastructure itself supports vector mode, but low level functions do not. I think there are two ways enable vector mode for compare_by_pieces. One is to modify the by-pieces infrastructure . Another is to enable it by cmpmem expand. The expand is target specific and be flexible. What's your opinion? Thanks Gui Haochen
[PATCH-1v4, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to define and use the constant mask for test data class insns. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/rs6000.md (ISNAN, ISINF, ISZERO, ISDENORMAL): Define. * config/rs6000/vsx.md (isinf2 for SFDF): New expand. (isinf2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index ac5651d7420..e84e6b08f03 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -53,6 +53,17 @@ (define_constants (FRAME_POINTER_REGNUM 110) ]) +;; +;; Test data class mask +;; + +(define_constants + [(ISNAN 0x40) + (ISINF 0x30) + (ISZERO 0xC) + (ISDENORMAL 0x3) + ]) + ;; ;; UNSPEC usage ;; diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..67615bae8c0 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5313,6 +5313,24 @@ (define_expand "xststdcp" operands[4] = CONST0_RTX (SImode); }) +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (ISINF))); + DONE; +}) + +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (ISINF))); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c new file mode 100644 index 000..c1c4f64ee8b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isinf (x); +} + +int test2 (float x) +{ + return __builtin_isinf (x); +} + +int test3 (float x) +{ + return __builtin_isinff (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c new file mode 100644 index 000..ed305e8572e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
[PATCH-3v4, rs6000] Implement optab_isnormal for SFDF and IEEE128
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to use the constant mask for test data class insns. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isnormal for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isnormal2 for SFDF): New expand. (isnormal2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-7.c: New test. * gcc.target/powerpc/pr97786-8.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 11d02e60170..b48986ac9eb 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5355,6 +5355,30 @@ (define_expand "isfinite2" DONE; }) +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + int mask = ISINF | ISNAN | ISZERO | ISDENORMAL; + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (mask))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + int mask = ISINF | ISNAN | ISZERO | ISDENORMAL; + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (mask))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c new file mode 100644 index 000..2df472e35d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isnormal (x); +} + +int test2 (float x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c new file mode 100644 index 000..00478dbf3ef --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH-2v4, rs6000] Implement optab_isfinite for SFDF and IEEE128
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to use the constant mask for test data class insns. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isfinite for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isfinite2 for SFDF): New expand. (isfinite2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-4.c: New test. * gcc.target/powerpc/pr97786-5.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 67615bae8c0..11d02e60170 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5331,6 +5331,30 @@ (define_expand "isinf2" DONE; }) +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + int mask = ISINF | ISNAN; + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (mask))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + int mask = ISINF | ISNAN; + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (mask))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c new file mode 100644 index 000..01faa962bd5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isfinite (x); +} + +int test2 (float x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c new file mode 100644 index 000..0e106b9f23a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html Thanks Gui Haochen 在 2024/6/24 9:40, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html > > Thanks > Gui Haochen > > 在 2024/6/20 14:56, HAO CHEN GUI 写道: >> Hi, >> Gently ping it. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html >> >> Thanks >> Gui Haochen >> >> 在 2024/5/30 10:46, HAO CHEN GUI 写道: >>> Hi, >>> The builtin isinf is not folded at front end if the corresponding optab >>> exists. It causes the range evaluation failed on the targets which has >>> optab_isinf. For instance, range-sincos.c will fail on the targets which >>> has optab_isinf as it calls builtin_isinf. >>> >>> This patch fixed the problem by adding range op for builtin isinf. >>> >>> Compared with previous version, the main change is to set the range to >>> 1 if it's infinite number otherwise to 0. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html >>> >>> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> >>> ChangeLog >>> Value Range: Add range op for builtin isinf >>> >>> The builtin isinf is not folded at front end if the corresponding optab >>> exists. So the range op for isinf is needed for value range analysis. >>> This patch adds range op for builtin isinf. >>> >>> gcc/ >>> * gimple-range-op.cc (class cfn_isinf): New. >>> (op_cfn_isinf): New variables. >>> (gimple_range_op_handler::maybe_builtin_call): Handle >>> CASE_FLT_FN (BUILT_IN_ISINF). >>> >>> gcc/testsuite/ >>> * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test. >>> >>> patch.diff >>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >>> index 55dfbb23ce2..4e60a42eaac 100644 >>> --- a/gcc/gimple-range-op.cc >>> +++ b/gcc/gimple-range-op.cc >>> @@ -1175,6 +1175,63 @@ private: >>>bool m_is_pos; >>> } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); >>> >>> +// Implement range operator for CFN_BUILT_IN_ISINF >>> +class cfn_isinf : public range_operator >>> +{ >>> +public: >>> + using range_operator::fold_range; >>> + using range_operator::op1_range; >>> + virtual bool fold_range (irange &r, tree type, const frange &op1, >>> + const irange &, relation_trio) const override >>> + { >>> +if (op1.undefined_p ()) >>> + return false; >>> + >>> +if (op1.known_isinf ()) >>> + { >>> + wide_int one = wi::one (TYPE_PRECISION (type)); >>> + r.set (type, one, one); >>> + return true; >>> + } >>> + >>> +if (op1.known_isnan () >>> + || (!real_isinf (&op1.lower_bound ()) >>> + && !real_isinf (&op1.upper_bound ( >>> + { >>> + r.set_zero (type); >>> + return true; >>> + } >>> + >>> +r.set_varying (type); >>> +return true; >>> + } >>> + virtual bool op1_range (frange &r, tree type, const irange &lhs, >>> + const frange &, relation_trio) const override >>> + { >>> +if (lhs.undefined_p ()) >>> + return false; >>> + >>> +if (lhs.zero_p ()) >>> + { >>> + nan_state nan (true); >>> + r.set (type, real_min_representable (type), >>> + real_max_representable (type), nan); >>> + return true; >>> + } >>> + >>> +if (!range_includes_zero_p (lhs)) >>> + { >>> + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. >>> + // Set range to [-INF,+INF] >>> + r.set_varying (type); >>> + r.clear_nan (); >>> + return true; >>> + } >>> + >>> +r.set_varying (type); >>> +return true; >>> + } >>> +} op_cfn_isinf; >>> >>> // Implement range operator for CFN_BUILT_IN_ >>> class cfn_parity : public range_operator >>> @@ -1268,6 +1325,11 @@ gimple_range_op_handler::maybe_builtin_call () >>>m_operator = &op_cfn_signbit;
Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins
The problem should be fixed after my value range patches being accepted. [PATCH-1v3] Value Range: Add range op for builtin isinf https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html [PATCH-2v4] Value Range: Add range op for builtin isfinite https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html [PATCH-3v2] Value Range: Add range op for builtin isnormal https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html 在 2024/6/29 9:35, Vineet Gupta 写道: > > > On 6/28/24 17:53, Vineet Gupta wrote: >> Currently isfinite and isnormal use float compare instructions with fp >> flags save/restored around them. Our perf team complained this could be >> costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to >> do FP compares w/o disturbing FP exception flags. >> >> Coincidently, upstream ijust few days back got support for the >> corresponding optabs. All that is needed is to wire these up in the >> backend. >> >> I was also hoping to get __builtin_inf() done but unforutnately it >> requires little more rtl foo/bar to implement a tri-modal return. >> >> Currently going thru CI testing. > > My local testing spotted one additional failure. > > FAIL: g++.dg/opt/pr107569.C -std=gnu++20 scan-tree-dump-times vrp1 > "return 1;" 2 > > The reason being > > bool > bar (double x) > { > [[assume (std::isfinite (x))]]; > return std::isfinite (x); > } > > generating the new seq > > .LFB4: > fclass.d a0,fa0 > andi a0,a0,126 > snez a0,a0 > ret > > vs. > > li a0,1 > ret > > I have a hunch this requires the pending value range patch from Hao Chen > GUI. > > Thx, > -Vineet > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
Ping^3 [PATCH-3v2] Value Range: Add range op for builtin isnormal
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html Thanks Gui Haochen 在 2024/6/24 9:41, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html > > Thanks > Gui Haochen > > 在 2024/6/20 14:58, HAO CHEN GUI 写道: >> Hi, >> Gently ping it. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html >> >> Thanks >> Gui Haochen >> >> 在 2024/5/30 10:46, HAO CHEN GUI 写道: >>> Hi, >>> This patch adds the range op for builtin isnormal. It also adds two >>> help function in frange to detect range of normal floating-point and >>> range of subnormal or zero. >>> >>> Compared to previous version, the main change is to set the range to >>> 1 if it's normal number otherwise to 0. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html >>> >>> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> ChangeLog >>> Value Range: Add range op for builtin isnormal >>> >>> The former patch adds optab for builtin isnormal. Thus builtin isnormal >>> might not be folded at front end. So the range op for isnormal is needed >>> for value range analysis. This patch adds range op for builtin isnormal. >>> >>> gcc/ >>> * gimple-range-op.cc (class cfn_isfinite): New. >>> (op_cfn_finite): New variables. >>> (gimple_range_op_handler::maybe_builtin_call): Handle >>> CFN_BUILT_IN_ISFINITE. >>> * value-range.h (class frange): Declare known_isnormal and >>> known_isdenormal_or_zero. >>> (frange::known_isnormal): Define. >>> (frange::known_isdenormal_or_zero): Define. >>> >>> gcc/testsuite/ >>> * gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test. >>> >>> patch.diff >>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >>> index 5ec5c828fa4..6787f532f11 100644 >>> --- a/gcc/gimple-range-op.cc >>> +++ b/gcc/gimple-range-op.cc >>> @@ -1289,6 +1289,61 @@ public: >>>} >>> } op_cfn_isfinite; >>> >>> +//Implement range operator for CFN_BUILT_IN_ISNORMAL >>> +class cfn_isnormal : public range_operator >>> +{ >>> +public: >>> + using range_operator::fold_range; >>> + using range_operator::op1_range; >>> + virtual bool fold_range (irange &r, tree type, const frange &op1, >>> + const irange &, relation_trio) const override >>> + { >>> +if (op1.undefined_p ()) >>> + return false; >>> + >>> +if (op1.known_isnormal ()) >>> + { >>> + wide_int one = wi::one (TYPE_PRECISION (type)); >>> + r.set (type, one, one); >>> + return true; >>> + } >>> + >>> +if (op1.known_isnan () >>> + || op1.known_isinf () >>> + || op1.known_isdenormal_or_zero ()) >>> + { >>> + r.set_zero (type); >>> + return true; >>> + } >>> + >>> +r.set_varying (type); >>> +return true; >>> + } >>> + virtual bool op1_range (frange &r, tree type, const irange &lhs, >>> + const frange &, relation_trio) const override >>> + { >>> +if (lhs.undefined_p ()) >>> + return false; >>> + >>> +if (lhs.zero_p ()) >>> + { >>> + r.set_varying (type); >>> + return true; >>> + } >>> + >>> +if (!range_includes_zero_p (lhs)) >>> + { >>> + nan_state nan (false); >>> + r.set (type, real_min_representable (type), >>> + real_max_representable (type), nan); >>> + return true; >>> + } >>> + >>> +r.set_varying (type); >>> +return true; >>> + } >>> +} op_cfn_isnormal; >>> + >>> // Implement range operator for CFN_BUILT_IN_ >>> class cfn_parity : public range_operator >>> { >>> @@ -1391,6 +1446,11 @@ gimple_range_op_handler::maybe_builtin_call () >>>m_operator = &op_cfn_isfinite; >>>break; >>> >>> +case CFN_BUILT_IN_ISNORMAL: >>> + m_op1 = gimple_call_arg (call, 0); >>> + m_operator = &op_cf
Ping^3 [PATCH-2v4] Value Range: Add range op for builtin isfinite
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html Thanks Gui Haochen 在 2024/6/24 9:41, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html > > Thanks > Gui Haochen > > 在 2024/6/20 14:57, HAO CHEN GUI 写道: >> Hi, >> Gently ping it. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html >> >> Thanks >> Gui Haochen >> >> 在 2024/5/30 10:46, HAO CHEN GUI 写道: >>> Hi, >>> This patch adds the range op for builtin isfinite. >>> >>> Compared to previous version, the main change is to set the range to >>> 1 if it's finite number otherwise to 0. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html >>> >>> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> ChangeLog >>> Value Range: Add range op for builtin isfinite >>> >>> The former patch adds optab for builtin isfinite. Thus builtin isfinite >>> might not be folded at front end. So the range op for isfinite is needed >>> for value range analysis. This patch adds range op for builtin isfinite. >>> >>> gcc/ >>> * gimple-range-op.cc (class cfn_isfinite): New. >>> (op_cfn_finite): New variables. >>> (gimple_range_op_handler::maybe_builtin_call): Handle >>> CFN_BUILT_IN_ISFINITE. >>> >>> gcc/testsuite/ >>> * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. >>> >>> patch.diff >>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >>> index 4e60a42eaac..5ec5c828fa4 100644 >>> --- a/gcc/gimple-range-op.cc >>> +++ b/gcc/gimple-range-op.cc >>> @@ -1233,6 +1233,62 @@ public: >>>} >>> } op_cfn_isinf; >>> >>> +//Implement range operator for CFN_BUILT_IN_ISFINITE >>> +class cfn_isfinite : public range_operator >>> +{ >>> +public: >>> + using range_operator::fold_range; >>> + using range_operator::op1_range; >>> + virtual bool fold_range (irange &r, tree type, const frange &op1, >>> + const irange &, relation_trio) const override >>> + { >>> +if (op1.undefined_p ()) >>> + return false; >>> + >>> +if (op1.known_isfinite ()) >>> + { >>> + wide_int one = wi::one (TYPE_PRECISION (type)); >>> + r.set (type, one, one); >>> + return true; >>> + } >>> + >>> +if (op1.known_isnan () >>> + || op1.known_isinf ()) >>> + { >>> + r.set_zero (type); >>> + return true; >>> + } >>> + >>> +r.set_varying (type); >>> +return true; >>> + } >>> + virtual bool op1_range (frange &r, tree type, const irange &lhs, >>> + const frange &, relation_trio) const override >>> + { >>> +if (lhs.undefined_p ()) >>> + return false; >>> + >>> +if (lhs.zero_p ()) >>> + { >>> + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented. >>> + // Set range to varying >>> + r.set_varying (type); >>> + return true; >>> + } >>> + >>> +if (!range_includes_zero_p (lhs)) >>> + { >>> + nan_state nan (false); >>> + r.set (type, real_min_representable (type), >>> + real_max_representable (type), nan); >>> + return true; >>> + } >>> + >>> +r.set_varying (type); >>> +return true; >>> + } >>> +} op_cfn_isfinite; >>> + >>> // Implement range operator for CFN_BUILT_IN_ >>> class cfn_parity : public range_operator >>> { >>> @@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call () >>>m_operator = &op_cfn_isinf; >>>break; >>> >>> +case CFN_BUILT_IN_ISFINITE: >>> + m_op1 = gimple_call_arg (call, 0); >>> + m_operator = &op_cfn_isfinite; >>> + break; >>> + >>> CASE_CFN_COPYSIGN_ALL: >>>m_op1 = gimple_call_arg (call, 0); >>>m_op2 = gimple_call_arg (call, 1); >>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c >>> b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c >>> new file mode 100644 >>> index 000..f5dce0a0486 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c >>> @@ -0,0 +1,31 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fdump-tree-evrp" } */ >>> + >>> +#include >>> +void link_error(); >>> + >>> +void test1 (double x) >>> +{ >>> + if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x)) >>> +link_error (); >>> +} >>> + >>> +void test2 (float x) >>> +{ >>> + if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x)) >>> +link_error (); >>> +} >>> + >>> +void test3 (double x) >>> +{ >>> + if (__builtin_isfinite (x) && __builtin_isinf (x)) >>> +link_error (); >>> +} >>> + >>> +void test4 (float x) >>> +{ >>> + if (__builtin_isfinite (x) && __builtin_isinf (x)) >>> +link_error (); >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
Ping^2 [PATCHv2, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653180.html Thanks Gui Haochen 在 2024/6/20 15:01, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653180.html > > Thanks > Gui Haochen > > 在 2024/5/31 11:25, HAO CHEN GUI 写道: >> Hi, >> This patch optimizes vector construction with two vector doubleword loads. >> It generates an optimal insn sequence as "xxlor" has lower latency than >> "mtvsrdd" on Power10. >> >> Compared with previous version, the main change is to use "isa" attribute >> to guard "lxsd" and "lxsdx". >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653103.html >> >> Bootstrapped and tested on powerpc64-linux BE and LE with no >> regressions. OK for the trunk? >> >> Thanks >> Gui Haochen >> >> ChangeLog >> rs6000: Optimize vector construction with two vector doubleword loads >> >> When constructing a vector by two doublewords from memory, originally it >> does >> ld 10,0(3) >> ld 9,0(4) >> mtvsrdd 34,9,10 >> >> An optimal sequence on Power10 should be >> lxsd 0,0(4) >> lxvrdx 1,0,3 >> xxlor 34,1,32 >> >> This patch does this optimization by insn combine and split. >> >> gcc/ >> PR target/103568 >> * config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn >> pattern. >> (vsx_ld_highpart_zero_): New insn pattern. >> (vsx_concat_mem_): New insn_and_split pattern. >> >> gcc/testsuite/ >> PR target/103568 >> * gcc.target/powerpc/pr103568.c: New test. >> >> patch.diff >> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >> index f135fa079bd..f9a2a260e89 100644 >> --- a/gcc/config/rs6000/vsx.md >> +++ b/gcc/config/rs6000/vsx.md >> @@ -1395,6 +1395,27 @@ (define_insn "vsx_ld_elemrev_v2di" >>"lxvd2x %x0,%y1" >>[(set_attr "type" "vecload")]) >> >> +(define_insn "vsx_ld_lowpart_zero_" >> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") >> +(vec_concat:VSX_D >> + (match_operand: 1 "memory_operand" "wY,Z") >> + (match_operand: 2 "zero_constant" "j,j")))] >> + "" >> + "@ >> + lxsd %0,%1 >> + lxsdx %x0,%y1" >> + [(set_attr "type" "vecload,vecload") >> + (set_attr "isa" "p9v,p7v")]) >> + >> +(define_insn "vsx_ld_highpart_zero_" >> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") >> +(vec_concat:VSX_D >> + (match_operand: 1 "zero_constant" "j") >> + (match_operand: 2 "memory_operand" "Z")))] >> + "TARGET_POWER10" >> + "lxvrdx %x0,%y2" >> + [(set_attr "type" "vecload")]) >> + >> (define_insn "vsx_ld_elemrev_v1ti" >>[(set (match_operand:V1TI 0 "vsx_register_operand" "=wa") >> (vec_select:V1TI >> @@ -3063,6 +3084,26 @@ (define_insn "vsx_concat_" >> } >>[(set_attr "type" "vecperm,vecmove")]) >> >> +(define_insn_and_split "vsx_concat_mem_" >> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") >> +(vec_concat:VSX_D >> + (match_operand: 1 "memory_operand" "wY,Z") >> + (match_operand: 2 "memory_operand" "Z,Z")))] >> + "TARGET_POWER10 && can_create_pseudo_p ()" >> + "#" >> + "&& 1" >> + [(const_int 0)] >> +{ >> + rtx tmp1 = gen_reg_rtx (mode); >> + rtx tmp2 = gen_reg_rtx (mode); >> + emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX >> (mode), >> + operands[1])); >> + emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2], >> + CONST0_RTX (mode))); >> + emit_insn (gen_ior3 (operands[0], tmp1, tmp2)); >> + DONE; >> +}) >> + >> ;; Combiner patterns to allow creating XXPERMDI's to access either double >> ;; word element in a vector register. >> (define_insn "*vsx_concat__1" >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c >> b/gcc/testsuite/gcc.target/powerpc/pr103568.c >> new file mode 100644 >> index 000..b2a06fb2162 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c >> @@ -0,0 +1,17 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ >> + >> +vector double test (double *a, double *b) >> +{ >> + return (vector double) {*a, *b}; >> +} >> + >> +vector long long test1 (long long *a, long long *b) >> +{ >> + return (vector long long) {*a, *b}; >> +} >> + >> +/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */ >> +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */ >> +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ >> +
[PATCH-1v5, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main changes are: 1 Define 3 mode attributes which are used for predicate, constraint and asm print selection. They help merge sp/dp/qp patterns to one. 2 Remove original sp/dp and qp patterns and combine them into one. 3 Rename corresponding icode name in rs6000-builtin.cc and rs6000-builtins.def. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655779.html The expand "isinf2" and following insn pattern for TF and KF mode should be guarded on "TARGET_FLOAT128_HW". It will be changed in sequential patch as some other "qp" insn patterns are also need to be changed. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/rs6000.md (constant VSX_TEST_DATA_CLASS_NAN, VSX_TEST_DATA_CLASS_POS_INF, VSX_TEST_DATA_CLASS_NEG_INF, VSX_TEST_DATA_CLASS_POS_ZERO, VSX_TEST_DATA_CLASS_NEG_ZERO, VSX_TEST_DATA_CLASS_POS_DENORMAL, VSX_TEST_DATA_CLASS_NEG_DENORMAL): Define. (mode_attr sdq, vsx_altivec, wa_v, x): Define. (mode_iterator IEEE_FP): Define. * config/rs6000/vsx.md (isinf2): New expand. (expand xststdcqp_, xststdcp): Combine into... (expand xststdc_): ...this. (insn *xststdcqp_, *xststdcp): Combine into... (insn *xststdc_): ...this. * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Rename CODE_FOR_xststdcqp_kf as CODE_FOR_xststdc_kf, CODE_FOR_xststdcqp_tf as CODE_FOR_xststdc_tf. * config/rs6000/rs6000-builtins.def: Rename xststdcdp as xststdc_df, xststdcsp as xststdc_sf, xststdcqp_kf as xststdc_kf. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test. patch.diff diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index bb9da68edc7..a62a5d4afa7 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -3357,8 +3357,8 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */, case CODE_FOR_xsiexpqpf_kf: icode = CODE_FOR_xsiexpqpf_tf; break; - case CODE_FOR_xststdcqp_kf: - icode = CODE_FOR_xststdcqp_tf; + case CODE_FOR_xststdc_kf: + icode = CODE_FOR_xststdc_tf; break; case CODE_FOR_xscmpexpqp_eq_kf: icode = CODE_FOR_xscmpexpqp_eq_tf; diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 3bc7fed6956..8ac4cc200c9 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -2752,11 +2752,11 @@ const signed int \ __builtin_vsx_scalar_test_data_class_dp (double, const int<7>); -VSTDCDP xststdcdp {} +VSTDCDP xststdc_df {} const signed int \ __builtin_vsx_scalar_test_data_class_sp (float, const int<7>); -VSTDCSP xststdcsp {} +VSTDCSP xststdc_sf {} const signed int __builtin_vsx_scalar_test_neg_dp (double); VSTDCNDP xststdcnegdp {} @@ -2925,7 +2925,7 @@ const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, \ const int<7>); -VSTDCQP xststdcqp_kf {} +VSTDCQP xststdc_kf {} const signed int __builtin_vsx_scalar_test_neg_qp (_Float128); VSTDCNQP xststdcnegqp_kf {} diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index a5d20594789..2d7f227e362 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -53,6 +53,20 @@ (define_constants (FRAME_POINTER_REGNUM 110) ]) +;; +;; Test data class mask +;; + +(define_constants + [(VSX_TEST_DATA_CLASS_NAN0x40) + (VSX_TEST_DATA_CLASS_POS_INF0x20) + (VSX_TEST_DATA_CLASS_NEG_INF0x10) + (VSX_TEST_DATA_CLASS_POS_ZERO 0x8) + (VSX_TEST_DATA_CLASS_NEG_ZERO 0x4) + (VSX_TEST_DATA_CLASS_POS_DENORMAL 0x2) + (VSX_TEST_DATA_CLASS_NEG_DENORMAL 0x1) + ]) + ;; ;; UNSPEC usage ;; @@ -605,6 +619,24 @@ (define_mode_iterator SFDF2 [SF DF]) (define_mode_attr sd [(SF "s") (DF "d") (V4SF "s") (V2DF "d")]) +; A generic s/d/q attribute, for sp/dp/qp for example. +(define_mode_attr sdq [(SF "s") (DF "d") + (TF "q") (KF "q")]) + +; A predicate attribute, for IEEE floating point +(define_mode_attr vsx_altivec [(SF "vsx_register_operand") + (DF "vsx_register_operand") + (TF "altivec_register_operand") + (KF "altivec_register_operand")]) + +; A constraint attribute, for IEEE floating point +(define_mode_attr wa_v [(SF "wa") (DF "wa") +
Re: [PATCH v2] RISC-V: use fclass insns to implement isfinite and isnormal builtins
Hi, 在 2024/7/10 8:04, Vineet Gupta 写道: > So it seems initial versions of the patch didn't specify anything about > output mode. Richi asked for it in review and in v4 Hao added it. > But I don't see anyone asking specifically for SImode. > I guess that can be relaxed. Hao do you have any inputs here ? The reviewer asked me to add the mode info for operands. So I picked up SImode for the output in order to make it consistent with the function definition in math.h. You can submit the patch to update the document if you want to use other modes. Thanks Gui Haochen
Re: [PATCH] Expand: Pass down equality only flag to cmpmem expand
Hi Jeff, 在 2024/7/10 7:35, Jeff Law 写道: > Is this patch still relevant? It was submitted after stage1 closed for > gcc-14. With the trunk open for development, you should probably rebase and > repost if the patch is still relevant/useful. > > Conceptually knowing that we just want to do an equality comparison seems > useful. I think there are other places where we track this information and > utilize it to improve initial code generation. The patch and its sequential patches are suspending as I am working on other issues. I will come back after completing the task at hand. Thanks Gui Haochen
[PATCH, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns
Hi, This patch adds TARGET_FLOAT128_HW into pattern conditions for quad- precision insns. Also it removes FLOAT128_IEEE_P check from pattern conditions if the mode of pattern is IEEE128 as the mode iterator - IEEE128 already checks with FLOAT128_IEEE_P. For test case float128-cmp2-runnable.c, it should be guarded with ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with ppc_float128_hw, so it's removed. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns gcc/ * config/rs6000/rs6000.md (*fpmask, floatdidf2, floatti2, floatunsti2, fix_truncti2): Add guard TARGET_FLOAT128_HW. (add3, sub3, mul3, div3, sqrt2, copysign3_hard, copysign3_soft, @neg2_hw, @abs2_hw, *nabs2_hw, fma4_hw, *fms4_hw, *nfma4_hw, *nfms4_hw, extend2_hw, truncdf2_hw, truncsf2_hw, fix_trunc2, *fix_trunc2_mem, float_si2_hw, floatuns_di2_hw, floor2, ceil2, btrunc2, round2, add3_odd, sub3_odd, mul3_odd, div3_odd, sqrt2_odd, fma4_odd, *fms4_odd, *nfma4_odd, *nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128): Remove guard FLOAT128_IEEE_P. * config/rs6000/vsx.md (xsxexpqp__, xsxsigqp__, xsiexpqpf_, xsiexpqp__, xscmpexpqp__, *xscmpexpqp, xststdcnegqp_): Add guard TARGET_FLOAT128_HW. (xststdc_, *xststdc_, xststdc_): Add guard TARGET_FLOAT128_HW for the IEEE128 mode. gcc/testsuite/ * testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace ppc_float128_sw with ppc_float128_hw and remove p9vector_hw. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 3ec5ffa3578..32e5f1c4c56 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -5820,7 +5820,7 @@ (define_insn "*fpmask" (match_operand:IEEE128 3 "altivec_register_operand" "v")]) (match_operand:V2DI 4 "all_ones_constant" "") (match_operand:V2DI 5 "zero_constant" "")))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "xscmp%V1qp %0,%2,%3" [(set_attr "type" "fpcompare")]) @@ -6928,7 +6928,7 @@ (define_insn "floatdidf2" (define_insn "floatti2" [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvsqqp %0,%1"; } @@ -6937,7 +6937,7 @@ (define_insn "floatti2" (define_insn "floatunsti2" [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvuqqp %0,%1"; } @@ -6946,7 +6946,7 @@ (define_insn "floatunsti2" (define_insn "fix_truncti2" [(set (match_operand:TI 0 "vsx_register_operand" "=v") (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvqpsqz %0,%1"; } @@ -6955,7 +6955,7 @@ (define_insn "fix_truncti2" (define_insn "fixuns_truncti2" [(set (match_operand:TI 0 "vsx_register_operand" "=v") (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvqpuqz %0,%1"; } @@ -15020,7 +15020,7 @@ (define_insn "add3" (plus:IEEE128 (match_operand:IEEE128 1 "altivec_register_operand" "v") (match_operand:IEEE128 2 "altivec_register_operand" "v")))] - "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_FLOAT128_HW" "xsaddqp %0,%1,%2" [(set_attr "type" "vecfloat") (set_attr "size" "128")]) @@ -15030,7 +15030,7 @@ (define_insn "sub3" (minus:IEEE128 (match_operand:IEEE128 1 "altivec_register_operand" "v") (match_operand:IEEE128 2 "altivec_register_operand" "v")))] - "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_FLOAT128_HW" "xssubqp %0,%1,%2" [(set_attr "type" "vecfloat") (set_attr "size" "128")]) @@ -15040,7 +15040,7 @@ (define_insn "mul3" (mult:IEEE128 (match_operand:IEEE128 1 "altivec_register_operand" "v") (match_operand:IEEE128 2 "altivec_register_operand" "v")))] - "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_FLOAT128_HW" "xsmulqp %0,%1,%2" [(set_attr "type" "qmul") (set_attr "size" "128")]) @@ -15050,7 +15050,7 @@ (define_insn "div3" (div:IEEE128 (match_operand:IEEE128 1 "altivec_register_operand" "v") (match_operand:IEEE128 2 "altivec_register_operand" "v")))] - "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_FLOAT128_HW" "xsdivqp %0,%1,%2" [(set_attr "type" "vecdiv") (set_attr "size" "
[PATCH-1v4] Value Range: Add range op for builtin isinf
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by adding range op for builtin isinf. It also fixed the issue in PR114678. Compared with previous version, the main change is to remove xfail for s390 in range-sincos.c and vrp-float-abs-1.c. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isinf The builtin isinf is not folded at front end if the corresponding optab exists. So the range op for isinf is needed for value range analysis. This patch adds range op for builtin isinf. gcc/ PR target/114678 * gimple-range-op.cc (class cfn_isinf): New. (op_cfn_isinf): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_FLT_FN (BUILT_IN_ISINF). gcc/testsuite/ PR target/114678 * gcc.dg/tree-ssa/range-isinf.c: New test. * gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390. * gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index a80b93cf063..24559951dd6 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1153,6 +1153,63 @@ private: bool m_is_pos; } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); +// Implement range operator for CFN_BUILT_IN_ISINF +class cfn_isinf : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange &r, tree type, const frange &op1, + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isinf ()) + { + wide_int one = wi::one (TYPE_PRECISION (type)); + r.set (type, one, one); + return true; + } + +if (op1.known_isnan () + || (!real_isinf (&op1.lower_bound ()) + && !real_isinf (&op1.upper_bound ( + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange &r, tree type, const irange &lhs, + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + nan_state nan (true); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. + // Set range to [-INF,+INF] + r.set_varying (type); + r.clear_nan (); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isinf; // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator @@ -1246,6 +1303,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_signbit; break; +CASE_FLT_FN (BUILT_IN_ISINF): + m_op1 = gimple_call_arg (call, 0); + m_operator = &op_cfn_isinf; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c new file mode 100644 index 000..468f1bcf5c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void +test1 (double x) +{ + if (x > __DBL_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test2 (float x) +{ + if (x > __FLT_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__FLT_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test3 (double x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__) +link_error (); +} + +void +test4 (float x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c index 35b38c3c914..337f9cda02f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/range
Re: Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf
Hi Ruoyao, Thanks for your info. I updated my patch and sent it for review. Thanks Gui Haochen 在 2024/7/10 22:01, Xi Ruoyao 写道: > On Wed, 2024-07-10 at 21:54 +0800, Xi Ruoyao wrote: >> On Mon, 2024-07-01 at 09:11 +0800, HAO CHEN GUI wrote: >>> Hi, >>> Gently ping it. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html >> >> I guess you can add PR114678 into the subject and the ChangeLog, and >> also mention the patch in the bugzilla. > > And, remove xfail in vrp-float-abs-1.c and range-sincos.c (if this patch > works as intended they should no longer fail). >
Re: [PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs
Hi Jeff, 在 2024/7/11 6:25, Jeff Law 写道: > OK. But given this patch is several months old, can you re-bootstrap & test > before committing to the trunk. Thanks. I will rebase the patch and test it again. Thanks Gui Haochen
Re: [PATCH-1v4] Value Range: Add range op for builtin isinf
Hi Jeff, Thanks for your comments. 在 2024/7/12 6:13, Jeff Law 写道: > > > On 7/11/24 1:32 AM, HAO CHEN GUI wrote: >> Hi, >> The builtin isinf is not folded at front end if the corresponding optab >> exists. It causes the range evaluation failed on the targets which has >> optab_isinf. For instance, range-sincos.c will fail on the targets which >> has optab_isinf as it calls builtin_isinf. >> >> This patch fixed the problem by adding range op for builtin isinf. It >> also fixed the issue in PR114678. >> >> Compared with previous version, the main change is to remove xfail for >> s390 in range-sincos.c and vrp-float-abs-1.c. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html >> >> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >> regressions. Is it OK for the trunk? >> >> Thanks >> Gui Haochen >> >> >> ChangeLog >> Value Range: Add range op for builtin isinf >> >> The builtin isinf is not folded at front end if the corresponding optab >> exists. So the range op for isinf is needed for value range analysis. >> This patch adds range op for builtin isinf. >> >> gcc/ >> PR target/114678 >> * gimple-range-op.cc (class cfn_isinf): New. >> (op_cfn_isinf): New variables. >> (gimple_range_op_handler::maybe_builtin_call): Handle >> CASE_FLT_FN (BUILT_IN_ISINF). >> >> gcc/testsuite/ >> PR target/114678 >> * gcc.dg/tree-ssa/range-isinf.c: New test. >> * gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390. >> * gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise. >> >> patch.diff >> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >> index a80b93cf063..24559951dd6 100644 >> --- a/gcc/gimple-range-op.cc >> +++ b/gcc/gimple-range-op.cc >> @@ -1153,6 +1153,63 @@ private: >> bool m_is_pos; >> } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); >> >> +// Implement range operator for CFN_BUILT_IN_ISINF >> +class cfn_isinf : public range_operator >> +{ >> +public: >> + using range_operator::fold_range; >> + using range_operator::op1_range; >> + virtual bool fold_range (irange &r, tree type, const frange &op1, >> + const irange &, relation_trio) const override >> + { >> + if (op1.undefined_p ()) >> + return false; >> + >> + if (op1.known_isinf ()) >> + { >> + wide_int one = wi::one (TYPE_PRECISION (type)); >> + r.set (type, one, one); >> + return true; >> + } >> + >> + if (op1.known_isnan () >> + || (!real_isinf (&op1.lower_bound ()) >> + && !real_isinf (&op1.upper_bound ( >> + { >> + r.set_zero (type); >> + return true; >> + } > So why the test for real_isinf on the upper/lower bound? If op1 is known to > be a NaN, then why test the bounds at all? If a bounds test is needed, why > only test the upper bound? > IMHO, logical is if the op1 is a NAN, it's not an infinite number. If the upper and lower bound both are finite numbers, the op1 is not an infinite number. Under both situations, the result should be set to 0 which means op1 isn't an infinite number. > >> + virtual bool op1_range (frange &r, tree type, const irange &lhs, >> + const frange &, relation_trio) const override >> + { >> + if (lhs.undefined_p ()) >> + return false; >> + >> + if (lhs.zero_p ()) >> + { >> + nan_state nan (true); >> + r.set (type, real_min_representable (type), >> + real_max_representable (type), nan); >> + return true; >> + } > If the result of a builtin_isinf is zero, that doesn't mean the input has a > nan state. It means we know it's not infinity. The input argument could be > anything but an Inf. > If the result of a builtin_isinf is zero, it means the input might be a NAN or a finite number. So the range should be [min_rep, max_rep] U NAN. Looking forward to your further comments. Thanks Gui Haochen > > Jeff
[PATCHv2, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns
Hi, This patch adds TARGET_FLOAT128_HW into pattern conditions for quad- precision insns. Some qp patterns are guarded by TARGET_P9_VECTOR originally, so replace it with "TARGET_FLOAT128_HW". For test case float128-cmp2-runnable.c, it should be guarded with ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with ppc_float128_hw, so it's removed. Compared to previous version, the main change it to split redundant FLOAT128_IEEE_P removal to another patch. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns gcc/ * config/rs6000/rs6000.md (floatti2, floatunsti2, fix_truncti2): Add guard TARGET_FLOAT128_HW. * config/rs6000/vsx.md (xsxexpqp__, xsxsigqp__, xsiexpqpf_, xsiexpqp__, xscmpexpqp__, *xscmpexpqp, xststdcnegqp_): Replace guard TARGET_P9_VECTOR with TARGET_FLOAT128_HW. (xststdc_, *xststdc_, isinf2): Add guard TARGET_FLOAT128_HW for the IEEE128 modes. gcc/testsuite/ * testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace ppc_float128_sw with ppc_float128_hw and remove p9vector_hw. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index deffc4b601c..c0f6599c08b 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6928,7 +6928,7 @@ (define_insn "floatdidf2" (define_insn "floatti2" [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvsqqp %0,%1"; } @@ -6937,7 +6937,7 @@ (define_insn "floatti2" (define_insn "floatunsti2" [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvuqqp %0,%1"; } @@ -6946,7 +6946,7 @@ (define_insn "floatunsti2" (define_insn "fix_truncti2" [(set (match_operand:TI 0 "vsx_register_operand" "=v") (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvqpsqz %0,%1"; } diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 1272f8b2080..7dd08895bec 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5157,7 +5157,7 @@ (define_insn "xsxexpqp__" (unspec:V2DI_DI [(match_operand:IEEE128 1 "altivec_register_operand" "v")] UNSPEC_VSX_SXEXPDP))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsxexpqp %0,%1" [(set_attr "type" "vecmove")]) @@ -5176,7 +5176,7 @@ (define_insn "xsxsigqp__" (unspec:VEC_TI [(match_operand:IEEE128 1 "altivec_register_operand" "v")] UNSPEC_VSX_SXSIG))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsxsigqp %0,%1" [(set_attr "type" "vecmove")]) @@ -5196,7 +5196,7 @@ (define_insn "xsiexpqpf_" [(match_operand:IEEE128 1 "altivec_register_operand" "v") (match_operand:DI 2 "altivec_register_operand" "v")] UNSPEC_VSX_SIEXPQP))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsiexpqp %0,%1,%2" [(set_attr "type" "vecmove")]) @@ -5208,7 +5208,7 @@ (define_insn "xsiexpqp__" (match_operand:V2DI_DI 2 "altivec_register_operand" "v")] UNSPEC_VSX_SIEXPQP))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsiexpqp %0,%1,%2" [(set_attr "type" "vecmove")]) @@ -5278,7 +5278,7 @@ (define_expand "xscmpexpqp__" (set (match_operand:SI 0 "register_operand" "=r") (CMP_TEST:SI (match_dup 3) (const_int 0)))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" { if ( == UNORDERED && !HONOR_NANS (mode)) { @@ -5296,7 +5296,7 @@ (define_insn "*xscmpexpqp" (match_operand:IEEE128 2 "altivec_register_operand" "v")] UNSPEC_VSX_SCMPEXPQP) (match_operand:SI 3 "zero_constant" "j")))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xscmpexpqp %0,%1,%2" [(set_attr "type" "fpcompare")]) @@ -5315,7 +5315,8 @@ (define_expand "xststdc_" (set (match_operand:SI 0 "register_operand" "=r") (eq:SI (match_dup 3) (const_int 0)))] - "TARGET_P9_VECTOR" + "TARGET_P9_VECTOR + && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)" { operands[3] = gen_reg_rtx (CCFPmode); operands[4] = CONST0_RTX (SImode); @@ -5324,7 +5325,8 @@ (define_expand "xststdc_" (define_expand "isinf2" [(use (match_operand:SI 0 "gpc_reg_operand")) (use (match_operand:IEEE_FP 1 ""))] - "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" + "TARGET_P9_VECTOR + && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)" { int mask = VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_
[PATCH, rs6000] Remove redundant guard for float128 mode patterns
Hi, This patch removes FLOAT128_IEEE_P guard when the mode of pattern is IEEE128 and FLOAT128_IBM_P when the mode of pattern is IBM128. The mode iterators already do the checking. So they're redundant. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Remove redundant guard for float128 mode patterns gcc/ * config/rs6000/rs6000.md (movcc, *movcc_p10, *movcc_invert_p10, *fpmask, *xxsel, @ieee_128bit_vsx_abs2, *ieee_128bit_vsx_nabs2, add3, sub3, mul3, div3, sqrt2, copysign3, copysign3_hard, copysign3_soft, @neg2_hw, @abs2_hw, *nabs2_hw, fma4_hw, *fms4_hw, *nfma4_hw, *nfms4_hw, extend2_hw, truncdf2_hw, truncsf2_hw, fix_2_hw, fix_trunc2, *fix_trunc2_mem, float_di2_hw, float_si2_hw, float2, floatuns_di2_hw, floatuns_si2_hw, floatuns2, floor2, ceil2, btrunc2, round2, add3_odd, sub3_odd, mul3_odd, div3_odd, sqrt2_odd, fma4_odd, *fms4_odd, *nfma4_odd, *nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128): Remove guard FLOAT128_IEEE_P. (@extenddf2_fprs, @extenddf2_vsx, truncdf2_internal1, truncdf2_internal2, fix_trunc_helper, neg2, *cmp_internal1, *cmp_internal2 for IBM128): Remove guard FLOAT128_IBM_P. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index c0f6599c08b..f22b7ed6256 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -5736,7 +5736,7 @@ (define_expand "movcc" (if_then_else:IEEE128 (match_operand 1 "comparison_operator") (match_operand:IEEE128 2 "gpc_reg_operand") (match_operand:IEEE128 3 "gpc_reg_operand")))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3])) DONE; @@ -5753,7 +5753,7 @@ (define_insn_and_split "*movcc_p10" (match_operand:IEEE128 4 "altivec_register_operand" "v,v") (match_operand:IEEE128 5 "altivec_register_operand" "v,v"))) (clobber (match_scratch:V2DI 6 "=0,&v"))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "#" "&& 1" [(set (match_dup 6) @@ -5785,7 +5785,7 @@ (define_insn_and_split "*movcc_invert_p10" (match_operand:IEEE128 4 "altivec_register_operand" "v,v") (match_operand:IEEE128 5 "altivec_register_operand" "v,v"))) (clobber (match_scratch:V2DI 6 "=0,&v"))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "#" "&& 1" [(set (match_dup 6) @@ -5820,7 +5820,7 @@ (define_insn "*fpmask" (match_operand:IEEE128 3 "altivec_register_operand" "v")]) (match_operand:V2DI 4 "all_ones_constant" "") (match_operand:V2DI 5 "zero_constant" "")))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "xscmp%V1qp %0,%2,%3" [(set_attr "type" "fpcompare")]) @@ -5831,7 +5831,7 @@ (define_insn "*xxsel" (match_operand:V2DI 2 "zero_constant" "")) (match_operand:IEEE128 3 "altivec_register_operand" "v") (match_operand:IEEE128 4 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "xxsel %x0,%x4,%x3,%x1" [(set_attr "type" "vecmove")]) @@ -8904,7 +8904,7 @@ (define_insn_and_split "@extenddf2_fprs" (match_operand:DF 1 "nonimmediate_operand" "d,m,d"))) (use (match_operand:DF 2 "nonimmediate_operand" "m,m,d"))] "!TARGET_VSX && TARGET_HARD_FLOAT - && TARGET_LONG_DOUBLE_128 && FLOAT128_IBM_P (mode)" + && TARGET_LONG_DOUBLE_128" "#" "&& reload_completed" [(set (match_dup 3) (match_dup 1)) @@ -8921,7 +8921,7 @@ (define_insn_and_split "@extenddf2_vsx" [(set (match_operand:IBM128 0 "gpc_reg_operand" "=d,d") (float_extend:IBM128 (match_operand:DF 1 "nonimmediate_operand" "wa,m")))] - "TARGET_LONG_DOUBLE_128 && TARGET_VSX && FLOAT128_IBM_P (mode)" + "TARGET_LONG_DOUBLE_128 && TARGET_VSX" "#" "&& reload_completed" [(set (match_dup 2) (match_dup 1)) @@ -8967,7 +8967,7 @@ (define_insn_and_split "truncdf2_internal1" [(set (match_operand:DF 0 "gpc_reg_operand" "=d,?d") (float_truncate:DF (match_operand:IBM128 1 "gpc_reg_operand" "0,d")))] - "FLOAT128_IBM_P (mode) && !TARGET_XL_COMPAT + "!TARGET_XL_COMPAT && TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128" "@ # @@ -8983,7 +8983,7 @@ (define_insn_and_split "truncdf2_internal1" (define_insn "truncdf2_internal2" [(set (match_operand:DF 0 "gpc_reg_operand" "=d") (float_truncate:DF (match_operand:IBM128 1 "gpc_reg_operand" "d")))] - "FLOAT12
[PATCH-2v5, rs6000] Implement optab_isfinite for SFDF and IEEE128
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to merge the patterns of SFDF and IEEE128 into one. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655780.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isfinite for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isfinite2): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-4.c: New test. * gcc.target/powerpc/pr97786-5.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index d30416a53e7..763cd916c8d 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5304,6 +5304,20 @@ (define_expand "isinf2" DONE; }) +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE_FP 1 ""))] + "TARGET_P9_VECTOR + && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)" +{ + rtx tmp = gen_reg_rtx (SImode); + int mask = VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_CLASS_NEG_INF +| VSX_TEST_DATA_CLASS_NAN; + emit_insn (gen_xststdc_ (tmp, operands[1], GEN_INT (mask))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c new file mode 100644 index 000..9cdde78257d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-require-effective-target powerpc_vsx } */ + +int test1 (double x) +{ + return __builtin_isfinite (x); +} + +int test2 (float x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c new file mode 100644 index 000..0ef8b86f6cb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ +/* { dg-require-effective-target powerpc_vsx } */ + +int test1 (long double x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH-3v5, rs6000] Implement optab_isnormal for SFDF and IEEE128
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to merge the patterns of SFDF and IEEE128 into one. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655781.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isnormal for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isnormal2): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-7.c: New test. * gcc.target/powerpc/pr97786-8.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 763cd916c8d..f818aba9e3e 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5318,6 +5318,23 @@ (define_expand "isfinite2" DONE; }) +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE_FP 1 ""))] + "TARGET_P9_VECTOR + && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)" +{ + rtx tmp = gen_reg_rtx (SImode); + int mask = VSX_TEST_DATA_CLASS_NAN +| VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_CLASS_NEG_INF +| VSX_TEST_DATA_CLASS_POS_ZERO | VSX_TEST_DATA_CLASS_NEG_ZERO +| VSX_TEST_DATA_CLASS_POS_DENORMAL +| VSX_TEST_DATA_CLASS_NEG_DENORMAL; + emit_insn (gen_xststdc_ (tmp, operands[1], GEN_INT (mask))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c new file mode 100644 index 000..eb01eed39d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-require-effective-target powerpc_vsx } */ + +int test1 (double x) +{ + return __builtin_isnormal (x); +} + +int test2 (float x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c new file mode 100644 index 000..eba90d3b1b7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ +/* { dg-require-effective-target powerpc_vsx } */ + +int test1 (long double x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Mikael, Thanks for your comments. 在 2024/5/9 16:03, Mikael Morin 写道: > I think the canonical API behaviour sets R to varying and returns true > instead of just returning false if nothing is known about the range. > > I'm not sure whether it makes any difference; Aldy can probably tell. But if > the type is bool, varying is [0,1] which is better than unknown range. Should the varying be set by caller when fold_range returns false? Just like following codes in value-query.cc. if (!op.fold_range (r, type, r0, r1)) r.set_varying (type); Thanks Gui Haochen
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Jakub, Thanks for your review comments. 在 2024/5/14 23:57, Jakub Jelinek 写道: > BUILT_IN_ISFINITE is just one of many BUILT_IN_IS... builtins, > would be nice to handle the others as well. > > E.g. isnormal/isnan/isinf, fpclassify etc. > Yes, I already sent the patches which add range op for isnormal/isnan/isinf for review. I will modify them according to review comments and submit them again. > Note, the man page says for e.g. isnormal that it returns nonzero or zero, > but in reality I think we implement it always inline and can check if > it always returns [0,1]. > Some others like isinf return [-1,1] though I think and fpclassify > returns union of all the passed int values. The gcc inline code always returns 0 or 1 for isnormal/isnan/isinf. But I wonder if all targets' expand can promise it. The rs6000 has an instruction for isnormal/isnan/isinf. So we're making the patch not to call inline codes and expand them by ourselves. Though rs6000 instruction returns 0 or 1 for them, not sure if other targets are the same. Thanks Gui Haochen
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Andrew, Thanks so much for your explanation. I got it. I will address the issue. Thanks Gui Haochen 在 2024/5/15 2:45, Andrew MacLeod 写道: > > On 5/9/24 04:47, HAO CHEN GUI wrote: >> Hi Mikael, >> >> Thanks for your comments. >> >> 在 2024/5/9 16:03, Mikael Morin 写道: >>> I think the canonical API behaviour sets R to varying and returns true >>> instead of just returning false if nothing is known about the range. >>> >>> I'm not sure whether it makes any difference; Aldy can probably tell. But >>> if the type is bool, varying is [0,1] which is better than unknown range. >> Should the varying be set by caller when fold_range returns false? >> Just like following codes in value-query.cc. >> >> if (!op.fold_range (r, type, r0, r1)) >> r.set_varying (type); >> > This would be dangerous in the general case. fold_range may have returned > false because 'type' is an unsupported range type. Generally this is why we > prefer range-ops to return TRUE and VARYING rather than FALSE for unknown > values. When FALSE is returned, we should stop working with ranges because > something is amok. > > Andrew >
Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]
Hi Segher, Thanks for your review comments. I will modify it and resend. Just one question on the insn condition. 在 2024/5/17 1:25, Segher Boessenkool 写道: >> +(define_expand "isnormal2" >> + [(use (match_operand:SI 0 "gpc_reg_operand")) >> +(use (match_operand:SFDF 1 "gpc_reg_operand"))] >> + "TARGET_HARD_FLOAT >> + && TARGET_P9_VECTOR" > Please put the condition on just one line if it is as simple and short > as this. > > Why is TARGET_P9_VECTOR the correct condition? This expand calls gen_xststdcp which is a P9 vector instruction and relies on "TARGET_P9_VECTOR". So I set the condition.
[PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to modify the dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isinf2 for SFDF): New expand. (isinf2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..fa20fb4df91 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5313,6 +5313,24 @@ (define_expand "xststdcp" operands[4] = CONST0_RTX (SImode); }) +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c new file mode 100644 index 000..c1c4f64ee8b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isinf (x); +} + +int test2 (float x) +{ + return __builtin_isinf (x); +} + +int test3 (float x) +{ + return __builtin_isinff (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c new file mode 100644 index 000..21d90868268 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
[PATCH-2v2, rs6000] Implement optab_isfinite for SFDF and IEEE128
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isfinite for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isfinite2 for SFDF): New expand. (isfinite2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-4.c: New test. * gcc.target/powerpc/pr97786-5.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f0cc02f7e7b..cbb538d6d86 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5333,6 +5333,28 @@ (define_expand "isinf2" DONE; }) +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c new file mode 100644 index 000..01faa962bd5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isfinite (x); +} + +int test2 (float x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c new file mode 100644 index 000..5fc98084274 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH-3v2, rs6000] Implement optab_isnormal for SFDF and IEEE128
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649368.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isnormal for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isnormal2 for SFDF): New expand. (isnormal2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-7.c: New test. * gcc.target/powerpc/pr97786-8.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index ab17178e0a8..cae30dc431e 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5353,6 +5353,28 @@ (define_expand "isfinite2" DONE; }) +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c new file mode 100644 index 000..2df472e35d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isnormal (x); +} + +int test2 (float x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c new file mode 100644 index 000..0416970b89b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite
Hi Andrew, 在 2024/5/19 3:42, Andrew Pinski 写道: > This is missing adding documentation for the new optab. > It should be documented in md.texi under `Standard Pattern Names For > Generation` section. Thanks for your reminder. I will add ones for all patches. Thanks Gui Haochen
[PATCHv2] Optab: add isfinite_optab for __builtin_isfinite
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to document isfinite in md.texi. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isfinite_optab for isfinite builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab for isfinite builtin. * optabs.def (isfinite_optab): New. * doc/md.texi (isfinite): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index f8d94c4b435..b8432f84020 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl) errno_set = true; builtin_optab = ilogb_optab; break; CASE_FLT_FN (BUILT_IN_ISINF): builtin_optab = isinf_optab; break; -case BUILT_IN_ISNORMAL: case BUILT_IN_ISFINITE: + builtin_optab = isfinite_optab; break; +case BUILT_IN_ISNORMAL: CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5730bda80dc..8ed70b3feea 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with operand 2. This pattern is not allowed to @code{FAIL}. +@cindex @code{isfinite@var{m}2} instruction pattern +@item @samp{isfinite@var{m}2} +Set operand 0 to nonzero if operand 1 is a finite floating-point +number and to 0 otherwise. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..dcd77315c2a 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") +OPTAB_D (isfinite_optab, "isfinite$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCHv2] Optab: add isnormal_optab for __builtin_isnormal
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to document isnormal in md.texi. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isnormal_optab for isnormal builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab for isnormal builtin. * optabs.def (isnormal_optab): New. * doc/md.texi (isnormal): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index b8432f84020..ccd57fce522 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl) case BUILT_IN_ISFINITE: builtin_optab = isfinite_optab; break; case BUILT_IN_ISNORMAL: + builtin_optab = isnormal_optab; break; CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 8ed70b3feea..b81b9dec18a 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8562,6 +8562,11 @@ This pattern is not allowed to @code{FAIL}. Set operand 0 to nonzero if operand 1 is a finite floating-point number and to 0 otherwise. +@cindex @code{isnormal@var{m}2} instruction pattern +@item @samp{isnormal@var{m}2} +Set operand 0 to nonzero if operand 1 is a normal floating-point +number and to 0 otherwise. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index dcd77315c2a..3c401fc0b4c 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") OPTAB_D (isfinite_optab, "isfinite$a2") +OPTAB_D (isnormal_optab, "isnormal$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCH-1v2] Value Range: Add range op for builtin isinf
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by adding range op for builtin isinf. Compared with previous version, the main change is to set varying if nothing is known about the range. https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isinf The builtin isinf is not folded at front end if the corresponding optab exists. So the range op for isinf is needed for value range analysis. This patch adds range op for builtin isinf. gcc/ * gimple-range-op.cc (class cfn_isinf): New. (op_cfn_isinf): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_FLT_FN (BUILT_IN_ISINF). gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 55dfbb23ce2..eb1b0aff77c 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1175,6 +1175,62 @@ private: bool m_is_pos; } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); +// Implement range operator for CFN_BUILT_IN_ISINF +class cfn_isinf : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange &r, tree type, const frange &op1, + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isinf ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || (!real_isinf (&op1.lower_bound ()) + && !real_isinf (&op1.upper_bound ( + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange &r, tree type, const irange &lhs, + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + nan_state nan (true); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. + // Set range to [-INF,+INF] + r.set_varying (type); + r.clear_nan (); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isinf; // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator @@ -1268,6 +1324,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_signbit; break; +CASE_FLT_FN (BUILT_IN_ISINF): + m_op1 = gimple_call_arg (call, 0); + m_operator = &op_cfn_isinf; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c new file mode 100644 index 000..468f1bcf5c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void +test1 (double x) +{ + if (x > __DBL_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test2 (float x) +{ + if (x > __FLT_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__FLT_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test3 (double x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__) +link_error (); +} + +void +test4 (float x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ +
[PATCH-2v3] Value Range: Add range op for builtin isfinite
Hi, This patch adds the range op for builtin isfinite. Compared to previous version, the main change is to set varying if nothing is known about the range. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isfinite The former patch adds optab for builtin isfinite. Thus builtin isfinite might not be folded at front end. So the range op for isfinite is needed for value range analysis. This patch adds range op for builtin isfinite. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 922ee7bf0f7..49b6d7abde1 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1229,6 +1229,61 @@ public: } } op_cfn_isinf; +//Implement range operator for CFN_BUILT_IN_ISFINITE +class cfn_isfinite : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange &r, tree type, const frange &op1, + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isfinite ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange &r, tree type, const irange &lhs, + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented. + // Set range to varying + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isfinite; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1326,6 +1381,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_isinf; break; +case CFN_BUILT_IN_ISFINITE: + m_op1 = gimple_call_arg (call, 0); + m_operator = &op_cfn_isfinite; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c new file mode 100644 index 000..f5dce0a0486 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
[PATCH-3] Value Range: Add range op for builtin isnormal
Hi, This patch adds the range op for builtin isnormal. It also adds two help function in frange to detect range of normal floating-point and range of subnormal or zero. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isnormal The former patch adds optab for builtin isnormal. Thus builtin isnormal might not be folded at front end. So the range op for isnormal is needed for value range analysis. This patch adds range op for builtin isnormal. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. * value-range.h (class frange): Declare known_isnormal and known_isdenormal_or_zero. (frange::known_isnormal): Define. (frange::known_isdenormal_or_zero): Define. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index d69900d1f56..4c3f9c98282 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1281,6 +1281,60 @@ public: } } op_cfn_isfinite; +//Implement range operator for CFN_BUILT_IN_ISNORMAL +class cfn_isnormal : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange &r, tree type, const frange &op1, + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isnormal ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf () + || op1.known_isdenormal_or_zero ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange &r, tree type, const irange &lhs, + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isnormal; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1383,6 +1437,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_isfinite; break; +case CFN_BUILT_IN_ISNORMAL: + m_op1 = gimple_call_arg (call, 0); + m_operator = &op_cfn_isnormal; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c new file mode 100644 index 000..c4df4d839b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x)) +link_error (); + + if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x)) +link_error (); + + if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isnormal (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isnormal (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ diff --git a/gcc/value-range.h b/gcc/value-range.h index 37ce91dc52d..1443d1906e5 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -588,6 +588,8 @@ public: bool maybe_isinf () const; bool signbit_p (bool &signbit) const; bool nan_signbit_p (bool &signbit) const; + bool known_isnormal () const; + bool known_isdenormal_or_zero () const; protected: virtual bool contains_p (tree cst) const override; @@ -1650,6 +1652,33 @@ frange::known_isfinite () const return (!maybe_isnan () && !real_isinf (&m_min) && !real_isinf (&m_max)); } +// Return TRUE if range is known to be normal. + +inline bool +frange::known_isnormal () const +{ + if (!known_isfinite ()) +return false; + + machine_mode mode = TYPE_MODE (type ()); + return (!real_isdenormal (&m_min, mode) && !real_isdenormal (&m_max, mode) + && !real_iszero (&m_min) && !real_iszero (&m_max) + && (!real_isneg (&m_min) ||
Re: [PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi Peter, Thanks for your comments. 在 2024/5/23 5:58, Peter Bergner 写道: > Is there a reason not to use the vsx_register_operand predicate for op1 > which matches the predicate for the operand of the xststdcp pattern > we're passing op1 to? No, I will fix them. Thanks Gui Haochen