[PATCH] RISC-V: Optimize branches with shifted immediate operands
The patch adds a new instruction pattern to handle conditional branches with equality checks between shifted arithmetic operands. This pattern optimizes the use of shifted constants (with trailing zeros), making it more efficient. For the C code: void f5(long long a) { if ((a & 0x212) == 0x200) g(); } before the patch, the assembly code was: f5: lia5,34734080 and a0,a0,a5 lia5,33554432 beq a0,a5,.L21 ret and after the patch the assembly is: f5: srli a5,a0,17 andi a5,a5,265 lia4,256 beq a5,a4,.L21 ret Tested on both RV32 and RV64 with no regressions. 2024-09-02 Jovan Vukic gcc/ChangeLog: PR target/113248 * config/riscv/riscv.md (*branch_shiftedarith_equals_shifted): New pattern. gcc/testsuite/ChangeLog: PR target/113248 * gcc.target/riscv/branch-1.c: Additional tests. --- gcc/config/riscv/riscv.md | 32 +++ gcc/testsuite/gcc.target/riscv/branch-1.c | 16 +--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 3289ed2155a..c98a66dbc7c 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -3126,6 +3126,38 @@ } [(set_attr "type" "branch")]) +(define_insn_and_split "*branch_shiftedarith_equals_shifted" + [(set (pc) + (if_then_else (match_operator 1 "equality_operator" + [(and:ANYI (match_operand:ANYI 2 "register_operand" "r") + (match_operand 3 "shifted_const_arith_operand" "i")) + (match_operand 4 "shifted_const_arith_operand" "i")]) + (label_ref (match_operand 0 "" "")) + (pc))) + (clobber (match_scratch:X 5 "=&r")) + (clobber (match_scratch:X 6 "=&r"))] + "!SMALL_OPERAND (INTVAL (operands[3])) +&& !SMALL_OPERAND (INTVAL (operands[4]))" + "#" + "&& reload_completed" + [(set (match_dup 5) (lshiftrt:X (subreg:X (match_dup 2) 0) (match_dup 8))) + (set (match_dup 5) (and:X (match_dup 5) (match_dup 9))) + (set (match_dup 6) (match_dup 10)) + (set (pc) (if_then_else (match_op_dup 1 [(match_dup 5) (match_dup 6)]) + (label_ref (match_dup 0)) (pc)))] +{ + HOST_WIDE_INT mask1 = INTVAL (operands[3]); + HOST_WIDE_INT mask2 = INTVAL (operands[4]); + int trailing = (ctz_hwi (mask1) > ctz_hwi (mask2)) + ? ctz_hwi (mask2) + : ctz_hwi (mask1); + + operands[8] = GEN_INT (trailing); + operands[9] = GEN_INT (mask1 >> trailing); + operands[10] = GEN_INT (mask2 >> trailing); +} +[(set_attr "type" "branch")]) + (define_insn_and_split "*branch_shiftedmask_equals_zero" [(set (pc) (if_then_else (match_operator 1 "equality_operator" diff --git a/gcc/testsuite/gcc.target/riscv/branch-1.c b/gcc/testsuite/gcc.target/riscv/branch-1.c index b4a3a946379..e09328fe705 100644 --- a/gcc/testsuite/gcc.target/riscv/branch-1.c +++ b/gcc/testsuite/gcc.target/riscv/branch-1.c @@ -28,10 +28,20 @@ void f4(long long a) g(); } +void f5(long long a) { + if ((a & 0x212) == 0x200) +g(); +} + +void f6(long long a) { + if ((a & 0x7000) == 0x3000) +g(); +} + /* { dg-final { scan-assembler-times "slli\t" 2 } } */ -/* { dg-final { scan-assembler-times "srli\t" 3 } } */ -/* { dg-final { scan-assembler-times "andi\t" 1 } } */ -/* { dg-final { scan-assembler-times "\tli\t" 1 } } */ +/* { dg-final { scan-assembler-times "srli\t" 5 } } */ +/* { dg-final { scan-assembler-times "andi\t" 3 } } */ +/* { dg-final { scan-assembler-times "\tli\t" 3 } } */ /* { dg-final { scan-assembler-not "addi\t" } } */ /* { dg-final { scan-assembler-not "and\t" } } */ -- 2.43.0 CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately.
Re: [PATCH] RISC-V: Optimize branches with shifted immediate operands
> It's worth noting there is a newer way which is usually slightly simpler > than a match_operator. Specifically code iterators. Thank you for the very detailed feedback. It is not a problem to add code iterators. I would add iterators for "eq" and "ne" in riscv/iterators.md since they don't currently exist: > (define_code_iterator any_eq [eq ne]) I would also add new values for "eq" and "ne". I assume it would be best to submit the patch again as version 2 with these changes. > The pattern uses shifted_const_arith_operand, which is good as it > validates that the constant, if normalized by shifting away its trailing > zeros fits in a simm12. > > But the normalization you're doing on the two constants is limited by > the smaller of trailing zero counts. So operands2 might be 0x8100 which > requires an 8 bit shift for normalization. operands3 might be 0x81000 > which requires a 12 bit shift for normalization. In that case we'll use > 8 as our shift count for normalization, resulting in: > > 0x8100 >> 8 = 0x81, a valid small operand > 0x81000 >> 8 = 0x810, not a valid small operand. > > > I think that'll generate invalid RTL at split time. > > What I think you need to do is in the main predicate (the same place > you're currently !SMALL_OPERAND (INTVAL (operands[3]))), you'll need to > check that both operands are SMALL_OPERAND after normalization. Regarding the second issue, thanks again for the clear explanation. While at first this might seem like a problem, I believe these cases won't actually be a problem. The comparisons you mentioned, (x & 0x81000) == 0x8100 and (x & 0x8100) == 0x81000, will always evaluate as false, and this pattern will never be used for them (https://godbolt.org/z/Y11EGMb4f). Even in general, I'm quite sure we will never encounter an operand, after shifting, greater than 2^11 (i.e. not a SMALL_OPERAND). I will provide my reasoning below, but if you find it incorrect, or have any counterexamples, I would be happy to make the requested changes, add the mentioned check and submit that as PATCH v2. Lets consider the general expression (x & c1) == c2, where t1 and t2 represent the counts of trailing zeros in each constant. There are three cases to consider: 1. When t1 == t2: The pattern works fine, with no edge cases. 2. When t1 > t2: The expression will always evaluate as false, and the pattern won’t even be considered. For example, (x & 0x81000) == 0x8100. 3. When t1 < t2: In this case: - c1 must be of the form 0x0KLM00 (where the highest bit of K cannot be set) to meet the shifted_const_arith_operand constraint, ensuring SMALL_OPERAND(0x0KLM) == true (i.e. 0x0KLM < 2^11). - To prevent the expression from immediately evaluating as false, c2 must be in the form 0x0PQ<0bxxx0>00, where this value has to have only 0 or 1 in bit positions where c1 has 1 (and 0 elsewhere). Otherwise, (x & c1) == c2 would instantly be false, so this pattern wouldn’t be used. Lets call this "the critical condition". - After shifting c1 and c2, we have c1 == 0xKLM and c2 == 0xPQ<0bxxx0>, assuming the LSB of M is set to 1. - Due to "the critical condition", c2 == 0xPQ<0bxxx0> cannot have the highest bit of P set to 1. Otherwise, (x & c1) == c2 would immediately evaluate as false, since 0xKLM is guaranteed not to have the highest bit of K set to 1. This guarantees that SMALL_OPERAND(0xPQ<0bxxx0>) will always be true (i.e. c2 < 2^11). CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately.
[PATCH] RISC-V: Improve code generation for select of consecutive constants
The patch optimizes code generated for comparisons of the form x > y ? 2 : 3 (x <= y ? 3 : 2) and x < y ? 2 : 3 (x >= y ? 3 : 2). For the following C code: long f1(long x, long y) { return (x > y) ? 2 : 3; } long f2(long x, long y) { return (x < y) ? 2 : 3; } Before the patch, the generated assembly is: f1(long, long): sgt a0, a0, a1 xoria0, a0, 1 addia0, a0, 2 ret f2(long, long): slt a0, a0, a1 xoria0, a0, 1 addia0, a0, 2 ret After the patch, the generated assembly is: f1(long, long): slt a0,a1,a0 xoria0,a0,3 ret f2(long, long): slt a0,a0,a1 xoria0,a0,3 ret If we only consider the first example, during the combine pass, it can match the "le:" pattern (x <= y), but that can only be replaced by the inverse operation x > y (y < x), which requires replacing the following addi instruction with an xori instruction as well, as proposed by the patch. Tested on both RV32 and RV64 with no regressions. 2024-09-17 Jovan Vukic PR target/108038 gcc/ChangeLog: * config/riscv/iterators.md (paired_lt): New code attributes. * config/riscv/riscv.md (*sle__xor): New pattern. (*sge__xor): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/slt-1.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/config/riscv/iterators.md | 4 +++ gcc/config/riscv/riscv.md | 32 +++ gcc/testsuite/gcc.target/riscv/slt-1.c | 44 ++ 3 files changed, 80 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/slt-1.c diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 2844cb02ff0..964d0f24f4c 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -234,6 +234,10 @@ (define_code_iterator any_lt [lt ltu]) (define_code_iterator any_le [le leu]) +(define_code_attr paired_lt [(gt "lt") (gtu "ltu") +(ge "lt") (geu "ltu") +(le "lt") (leu "ltu")]) + ; atomics code iterator (define_code_iterator any_atomic [plus ior xor and]) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 9f94b5aa023..b22a8ac05e0 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -3600,6 +3600,38 @@ [(set_attr "type" "slt") (set_attr "mode" "")]) +(define_insn_and_split "*sle__xor" + [(set (match_operand:GPR 0 "register_operand" "=r") + (plus:GPR (any_le:GPR (match_operand:X 1 "register_operand" " r") + (match_operand:X 2 "register_operand" " r")) + (match_operand 3 "const_arith_operand" "i")))] + "INTVAL (operands[3]) % 2 == 0" + "#" + "&& 1" + [(set (match_dup 0) (:GPR (match_dup 2) (match_dup 1))) + (set (match_dup 0) (xor:GPR (match_dup 0) (match_dup 3)))] +{ + operands[3] = GEN_INT (INTVAL (operands[3]) + 1); +} + [(set_attr "type" "slt") + (set_attr "mode" "")]) + +(define_insn_and_split "*sge__xor" + [(set (match_operand:GPR 0 "register_operand" "=r") + (plus:GPR (any_ge:GPR (match_operand:X 1 "register_operand" " r") + (match_operand:X 2 "register_operand" " r")) + (match_operand 3 "const_arith_operand" "i")))] + "INTVAL (operands[3]) % 2 == 0" + "#" + "&& 1" + [(set (match_dup 0) (:GPR (match_dup 1) (match_dup 2))) + (set (match_dup 0) (xor:GPR (match_dup 0) (match_dup 3)))] +{ + operands[3] = GEN_INT (INTVAL (operands[3]) + 1); +} + [(set_attr "type" "slt") + (set_attr "mode" "")]) + ;; ;; ;; diff --git a/gcc/testsuite/gcc.target/riscv/slt-1.c b/gcc/testsuite/gcc.target/riscv/slt-1.c new file mode 100644 index 000..c672da92acf --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/slt-1.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */ + +#include +
[PATCH v3] RISC-V: Optimize branches with shifted immediate operands
After the valuable feedback I received, it’s clear to me that the oversight was in the tests showing the benefits of the patch. In the test file, I added functions f5 and f6, which now generate more efficient code with fewer instructions. Before the patch: f5: li a4,2097152 addia4,a4,-2048 li a5,1167360 and a0,a0,a4 addia5,a5,-2048 beq a0,a5,.L4 f6: li a5,3407872 addia5,a5,-2048 and a0,a0,a5 li a5,1114112 beq a0,a5,.L7 After the patch: f5: srlia5,a0,11 andia5,a5,1023 li a4,569 beq a5,a4,.L5 f6: srlia5,a0,11 andia5,a5,1663 li a4,544 beq a5,a4,.L9 2024-10-09 Jovan Vukic PR target/115921 gcc/ChangeLog: * config/riscv/iterators.md (any_eq): New code iterator. * config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro. (SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto. * config/riscv/riscv.md (*branch_shiftedarith__shifted): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/branch-1.c: Additional tests. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/config/riscv/iterators.md | 4 +++ gcc/config/riscv/riscv.h | 12 + gcc/config/riscv/riscv.md | 32 +++ gcc/testsuite/gcc.target/riscv/branch-1.c | 18 ++--- 4 files changed, 63 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 872c542e906..081659499a9 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -233,6 +233,8 @@ (define_code_iterator any_ge [ge geu]) (define_code_iterator any_lt [lt ltu]) (define_code_iterator any_le [le leu]) +(define_code_iterator any_eq [eq ne]) + ;; Iterators for conditions we can emit a sCC against 0 or a reg directly (define_code_iterator scc_0 [eq ne gt gtu]) @@ -285,6 +287,8 @@ (le "le") (gt "gt") (lt "lt") +(eq "eq") +(ne "ne") (ior "ior") (xor "xor") (and "and") diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 53b7b2a40ed..ca1b8329cdc 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -667,6 +667,18 @@ enum reg_class /* True if bit BIT is set in VALUE. */ #define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0) +/* Returns the smaller (common) number of trailing zeros for VAL1 and VAL2. */ +#define COMMON_TRAILING_ZEROS(VAL1, VAL2) \ + (ctz_hwi (VAL1) < ctz_hwi (VAL2) \ + ? ctz_hwi (VAL1)\ + : ctz_hwi (VAL2)) + +/* Returns true if both VAL1 and VAL2 are SMALL_OPERANDs after shifting by + the common number of trailing zeros. */ +#define SMALL_AFTER_COMMON_TRAILING_SHIFT(VAL1, VAL2) \ + (SMALL_OPERAND ((VAL1) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)) \ + && SMALL_OPERAND ((VAL2) >> COMMON_TRAILING_ZEROS (VAL1, VAL2))) + /* Stack layout; function entry, exit and calling. */ #define STACK_GROWS_DOWNWARD 1 diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 688c07df46c..78112afbb26 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -3129,6 +3129,38 @@ } [(set_attr "type" "branch")]) +(define_insn_and_split "*branch_shiftedarith__shifted" + [(set (pc) + (if_then_else (any_eq + (and:ANYI (match_operand:ANYI 1 "register_operand" "r") + (match_operand 2 "shifted_const_arith_operand" "i")) + (match_operand 3 "shifted_const_arith_operand" "i")) +(label_ref (match_operand 0 "" "")) +(pc))) + (clobber (match_scratch:X 4 "=&r")) + (clobber (match_scratch:X 5 "=&r"))] + "!SMALL_OPERAND (INTVAL (operands[2])) +&& !SMALL_OPERAND (INTVAL (operands[3])) +&& SMALL_AFTER_COMMON_TRAILING_SHIFT (INTVAL (operands[2]), +
[PATCH v2] RISC-V: Optimize branches with shifted immediate operands
Based on the feedback I received, the patch now uses a custom code iterator instead of relying on match_operator. Since both operands (2 and 3) have trailing zeros, an additional check was introduced to verify if they remain SMALL_OPERANDs after shifting by the smaller number of trailing zeros. To avoid complex inline check, I introduced two new macros, ensuring that the check is encapsulated and reusable, as suggested in the original feedback. It was pointed out that this check should be added because expressions like (x & 0x81000) == 0x8100 or (x & 0x8100) == 0x81000 might lead to unrecognized instructions during splitting. However, both expressions immediately evaluate to false (https://godbolt.org/z/Y11EGMb4f) due to the bit arrangement in constants 0x8100 and 0x81000. As a result, this pattern won't be applied in such cases, so it cannot cause any issues. Therefore, it wasn't necessary to include this as a negative test in the testsuite. Despite my efforts, I couldn't find another example for a negative test. All similar cases evaluated to false immediately due to the bit arrangement. I mentioned this in a follow-up comment on the previous patch version. Regardless, the necessary check has been added, so if a negative case exists, an unrecognized instruction will not be created. 2024-09-30 Jovan Vukic PR target/115921 gcc/ChangeLog: * config/riscv/iterators.md (any_eq): New code iterator. * config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro. (SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto. * config/riscv/riscv.md (*branch_shiftedarith__shifted): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/branch-1.c: Additional tests. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/config/riscv/iterators.md | 3 +++ gcc/config/riscv/riscv.h | 12 + gcc/config/riscv/riscv.md | 32 +++ gcc/testsuite/gcc.target/riscv/branch-1.c | 16 +--- 4 files changed, 60 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 2844cb02ff0..288f11464a2 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -233,6 +233,7 @@ (define_code_iterator any_ge [ge geu]) (define_code_iterator any_lt [lt ltu]) (define_code_iterator any_le [le leu]) +(define_code_iterator any_eq [eq ne]) ; atomics code iterator (define_code_iterator any_atomic [plus ior xor and]) @@ -283,6 +284,8 @@ (le "le") (gt "gt") (lt "lt") +(eq "eq") +(ne "ne") (ior "ior") (xor "xor") (and "and") diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 3aecb43f831..c782499b2bf 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -667,6 +667,18 @@ enum reg_class /* True if bit BIT is set in VALUE. */ #define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0) +/* Returns the smaller (common) number of trailing zeros for VAL1 and VAL2. */ +#define COMMON_TRAILING_ZEROS(VAL1, VAL2) \ + (ctz_hwi (VAL1) < ctz_hwi (VAL2) \ + ? ctz_hwi (VAL1)\ + : ctz_hwi (VAL2)) + +/* Returns true if both VAL1 and VAL2 are SMALL_OPERANDs after shifting by + the common number of trailing zeros. */ +#define SMALL_AFTER_COMMON_TRAILING_SHIFT(VAL1, VAL2) \ + (SMALL_OPERAND ((VAL1) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)) \ + && SMALL_OPERAND ((VAL2) >> COMMON_TRAILING_ZEROS (VAL1, VAL2))) + /* Stack layout; function entry, exit and calling. */ #define STACK_GROWS_DOWNWARD 1 diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 0410d990ec5..afc534d050f 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -3129,6 +3129,38 @@ } [(set_attr "type" "branch")]) +(define_insn_and_split "*branch_shiftedarith__shifted" + [(set (pc) + (if_then_else (any_eq + (and:ANYI (match_operand:ANYI 1 "register_operand" "r") + (match_operand 2 "shifted_const_arith_operand&qu
[PATCH] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"
Currently, within the phiopt pass, under value_replacement, we have the option to replace the expression "(cond & (a == b)) ? a : b" with "b". It checks whether there is a BIT_AND_EXPR and verifies if one of the operands contains the expression "a == b". However, the same result should also apply to the expression with the inverted condition "(cond | (a != b)) ? b : a," so this patch makes slight modifications to the existing code to cover this situation as well. The patch adds a check for whether there is a BIT_IOR_EXPR and if it contains "a != b". I added a test for both optimizations, as I could not find an adequate test in the testsuite for the first one (even though that optimization exists). For example, for this C code: int foo(int a, int b, int c) { return ((a > c) | (a != b)) ? b : a; } Before the patch, the generated assembly for RISC-V is: foo: bgt a0, a2, .L4 bne a0, a1, .L4 ret .L4: mv a0, a1 ret After the patch, the generated assembly becomes: foo: mv a0, a1 ret 2024-10-22 Jovan Vukic gcc/ChangeLog: * tree-ssa-phiopt.cc (rhs_is_fed_for_value_replacement): Add a new optimization opportunity for BIT_IOR_EXPR and a != b. (operand_equal_for_value_replacement): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-same-3.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- .../gcc.dg/tree-ssa/phi-opt-same-3.c | 17 +++ gcc/tree-ssa-phiopt.cc| 48 --- 2 files changed, 47 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c new file mode 100644 index 000..f9838f177a1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-phiopt2 -fdump-tree-optimized" } */ + +int f1(int a, int b, int c) +{ + if ((a > c) & (a == b)) return a; + return b; +} + +int f2(int a, int b, int c) +{ + if ((a > c) | (a != b)) return b; + return a; +} + +/* { dg-final { scan-tree-dump-not "if " "phiopt2" } } */ +/* { dg-final { scan-tree-dump-not "if " "optimized" } } */ diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc index 38cc8506d1f..956fe919cc0 100644 --- a/gcc/tree-ssa-phiopt.cc +++ b/gcc/tree-ssa-phiopt.cc @@ -1078,17 +1078,18 @@ jump_function_from_stmt (tree *arg, gimple *stmt) return false; } -/* RHS is a source argument in a BIT_AND_EXPR which feeds a conditional - of the form SSA_NAME NE 0. +/* RHS is a source argument in a BIT_AND_EXPR or BIT_IOR_EXPR which feeds + a conditional of the form SSA_NAME NE 0. - If RHS is fed by a simple EQ_EXPR comparison of two values, see if - the two input values of the EQ_EXPR match arg0 and arg1. + If RHS is fed by a simple EQ_EXPR or NE_EXPR comparison of two values, + see if the two input values of the comparison match arg0 and arg1. If so update *code and return TRUE. Otherwise return FALSE. */ static bool rhs_is_fed_for_value_replacement (const_tree arg0, const_tree arg1, - enum tree_code *code, const_tree rhs) + enum tree_code *code, const_tree rhs, + enum tree_code bit_expression_code) { /* Obviously if RHS is not an SSA_NAME, we can't look at the defining statement. */ @@ -1096,11 +1097,15 @@ rhs_is_fed_for_value_replacement (const_tree arg0, const_tree arg1, { gimple *def1 = SSA_NAME_DEF_STMT (rhs); - /* Verify the defining statement has an EQ_EXPR on the RHS. */ - if (is_gimple_assign (def1) && gimple_assign_rhs_code (def1) == EQ_EXPR) + /* Verify the defining statement has an EQ_EXPR or NE_EXPR on the RHS. */ + if (is_gimple_assign (def1) && + ((bit_expression_code == BIT_AND_EXPR && + gimple_assign_rhs_code (def1) == EQ_EXPR) || + (bit_expression_code == BIT_IOR_EXPR && + gimple_assign_rhs_code (def1) == NE_EXPR))) { - /* Finally verify the source operands of the EQ_EXPR are equal - to arg0 and arg1. */ + /* Finally verify the source operands of the EQ_EXPR or NE_EXPR + are equal to arg0 and arg1. */ tree op0 = gimple_assign_rhs1 (def1); tree op1 = gimple_assign_rhs2 (def1); if ((operand_equal_for_phi_arg_p
[PATCH v2] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"
Thanks for the feedback on the first version of the patch. Accordingly: I have corrected the code formatting as requested. I added new tests to the existing file phi-opt-11.c, instead of creating a new one. I performed testing before and after applying the patch on the x86 architecture, and I confirm that there are no new regressions. The logic and general code of the patch itself have not been changed. > So the A EQ/NE B expression, we can reverse A and B in the expression > and still get the same result. But don't we have to be more careful for > the TRUE/FALSE arms of the ternary? For BIT_AND we need ? a : b for > BIT_IOR we need ? b : a. > > I don't see that gets verified in the existing code or after your > change. I suspect I'm just missing something here. Can you clarify how > we verify that BIT_AND gets ? a : b for the true/false arms and that > BIT_IOR gets ? b : a for the true/false arms? I did not communicate this clearly last time, but the existing optimization simplifies the expression "(cond & (a == b)) ? a : b" to the simpler "b". Similarly, the expression "(cond & (a == b)) ? b : a" simplifies to "a". Thus, the existing and my optimization perform the following simplifications: (cond & (a == b)) ? a : b -> b (cond & (a == b)) ? b : a -> a (cond | (a != b)) ? a : b -> a (cond | (a != b)) ? b : a -> b For this reason, for BIT_AND_EXPR when we have A EQ B, it is sufficient to confirm that one operand matches the true/false arm and the other matches the false/true arm. In both cases, we simplify the expression to the third operand of the ternary operation (i.e., OP0 ? OP1 : OP2 simplifies to OP2). This is achieved in the value_replacement function after successfully setting the value of *code within the rhs_is_fed_for_value_replacement function to EQ_EXPR. For BIT_IOR_EXPR, the same check is performed for A NE B, except now *code remains NE_EXPR, and then value_replacement returns the second operand (i.e., OP0 ? OP1 : OP2 simplifies to OP1). 2024-10-30 Jovan Vukic gcc/ChangeLog: * tree-ssa-phiopt.cc (rhs_is_fed_for_value_replacement): Add a new optimization opportunity for BIT_IOR_EXPR and a != b. (operand_equal_for_value_replacement): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-11.c: Add more tests. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c | 31 +- gcc/tree-ssa-phiopt.cc | 48 ++ 2 files changed, 60 insertions(+), 19 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c index 14c82cd5216..d1e284c5325 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -fdump-tree-optimized --param logical-op-non-short-circuit=1" } */ +/* { dg-options "-O1 -fdump-tree-phiopt2 -fdump-tree-optimized --param logical-op-non-short-circuit=1" } */ int f(int a, int b, int c) { @@ -22,4 +22,33 @@ int h(int a, int b, int c, int d) return a; } +int i(int a, int b, int c) +{ + if ((a > c) & (a == b)) +return a; + return b; +} + +int j(int a, int b, int c) +{ + if ((a > c) & (a == b)) +return b; + return a; +} + +int k(int a, int b, int c) +{ + if ((a > c) | (a != b)) +return b; + return a; +} + +int l(int a, int b, int c) +{ + if ((a > c) | (a != b)) +return a; + return b; +} + +/* { dg-final { scan-tree-dump-times "if" 0 "phiopt2" } } */ /* { dg-final { scan-tree-dump-times "if" 0 "optimized" } } */ diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc index cffafe101a4..61b33bfc361 100644 --- a/gcc/tree-ssa-phiopt.cc +++ b/gcc/tree-ssa-phiopt.cc @@ -1078,17 +1078,18 @@ jump_function_from_stmt (tree *arg, gimple *stmt) return false; } -/* RHS is a source argument in a BIT_AND_EXPR which feeds a conditional - of the form SSA_NAME NE 0. +/* RHS is a source argument in a BIT_AND_EXPR or BIT_IOR_EXPR which feeds + a conditional of the form SSA_NAME NE 0. - If RHS is fed by a simple EQ_EXPR comparison of two values, see if - the two input values of the EQ_EXPR match arg0 and arg1. + If RHS is fed by a simple EQ_EXPR or NE_EXPR comparison of two values, + see if the two input values of
Re: [PATCH v2] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"
On 11/02/24, Jeff Law wrote: >This is well understood. The key in my mind is that for AND we always >select the FALSE arm. For IOR we always select the TRUE arm. Yes, I agree. >> e = (code == NE_EXPR ? true_edge : false_edge); >If I understand everything correctly your assertion is that we'll only >get here for AND/EQ_EXPR and IOR/NE_EXPR. There's no way to get here >for AND/NE_EXPR or IOR/EQ_EXPR? If we examine the patch step by step, we can see that the function rhs_is_fed_for_value_replacement enters the if block exclusively for the combinations BIT_AND_EXPR/EQ_EXPR and BIT_IOR_EXPR/NE_EXPR. It is only at this point that it returns true and sets the value of *code. This is evident in the code: > if (TREE_CODE (rhs) == SSA_NAME) >{ > ... > if (is_gimple_assign (def1) > && ((bit_expression_code == BIT_AND_EXPR > && gimple_assign_rhs_code (def1) == EQ_EXPR) > || (bit_expression_code == BIT_IOR_EXPR > && gimple_assign_rhs_code (def1) == NE_EXPR))) > { > ... > *code = gimple_assign_rhs_code (def1); > return true; > ... > } >} > return false; The function operand_equal_for_value_replacement only passes the value it receives from rhs_is_fed_for_value_replacement to the call site. At the call site, the returned value initializes the variable equal_p. Thus, the return value is simply passed from rhs_is_fed_for_value_replacement to the variable equal_p, and it is true only in the situation of BIT_AND_EXPR/EQ_EXPR or BIT_IOR_EXPR/NE_EXPR because that is how it is defined in rhs_is_fed_for_value_replacement. The patch only sets equal_p to true for the combinations AND/EQ and OR/NE. These are the only two situations when equal_p becomes true as a result of the patch, and therefore, only in those situations, as a consequence of the patch, does the line of code you quoted execute: > if (equal_p || maybe_equal_p) > { > ... > e = (code == NE_EXPR ? true_edge : false_edge); > ... > if (...) > { > ... > if (equal_p) > { > ... > return 2; > } > } > else if (equal_p) > { > ... > return 1; > } > } Also, Mr. Pinski left a comment (https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667258.html) and offered some suggestions about the patch. He also mentioned that he is working on integrating the affected code into match-and-simplify, so the rewrite is on the way. We can either move forward with this patch or stop it. Either way, I’m fine with any decision made. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately.
match.pd: Add pattern to simplify `(a - 1) & -a` to `0`
The patch simplifies expressions (a - 1) & -a, (a - 1) | -a, and (a - 1) ^ -a to the constants 0, -1, and -1, respectively. Currently, GCC does not perform these simplifications. Bootstrapped and tested on x86-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-11.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/match.pd | 11 +++ gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 23 +++ 2 files changed, 34 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c diff --git a/gcc/match.pd b/gcc/match.pd index c10bf9a7b80..6d95ceaa194 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1472,6 +1472,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_and:c @0 (bit_not (bit_xor:c @0 @1))) (bit_and @0 @1)) +/* Transform: + (a - 1) & -a -> 0. + (a - 1) | -a -> -1. + (a - 1) ^ -a -> -1. */ +(for bit_op (bit_ior bit_xor bit_and) + (simplify + (bit_op:c (plus @0 integer_minus_onep) (negate @0)) + (if (bit_op == BIT_AND_EXPR) +{ build_zero_cst (type); } +{ build_minus_one_cst (type); }))) + /* a & (a == b) --> a & b (boolean version of the above). */ (simplify (bit_and:c @0 (nop_convert? (eq:c @0 @1))) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c b/gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c new file mode 100644 index 000..c2d646e2b8d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */ + +int foo1(int a) +{ +return (a - 1) & -a; +} + +int foo2(int a) +{ +return (a - 1) | -a; +} + +int foo3(int a) +{ +return (a - 1) ^ -a; +} + +/* { dg-final { scan-tree-dump-not "bit_and_expr, " "optimized" } } */ +/* { dg-final { scan-tree-dump-not "bit_ior_expr, " "optimized" } } */ +/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "optimized" } } */ +/* { dg-final { scan-tree-dump-not "negate_expr, " "optimized" } } */ + -- 2.43.0
match.pd: Add pattern to simplify `((X - 1) & ~X) < 0` to `X == 0`
The patch makes the following simplifications: ((X - 1) & ~X) < 0 -> X == 0 ((X - 1) & ~X) >= 0 -> X != 0 On x86, the number of instructions is reduced from 4 to 3, but on platforms like RISC-V, it reduces to a single instruction. Bootstrapped and tested on x86-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. The patch makes the following simplifications: ((X - 1) & ~X) < 0 -> X == 0 ((X - 1) & ~X) >= 0 -> X != 0 On x86, the number of instructions is reduced from 4 to 3, but on platforms like RISC-V, it reduces to a single instruction. Bootstrapped and tested on x86-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c: New test. --- gcc/match.pd | 14 ++ .../gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c | 28 +++ 2 files changed, 42 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c diff --git a/gcc/match.pd b/gcc/match.pd index 6d95ceaa194..1a92acda8ab 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2245,6 +2245,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (bitwise_equal_p (@0, @3)) (convert (bit_ior (bit_and @1 (convert @2)) (convert @0)) +/* Fold ((X - 1) & ~X) = 0 into X ==/!= 0. */ +(for cmp (lt ge) + eqne (eq ne) + (simplify + (cmp:c + (bit_and:c + (plus @0 integer_minus_onep) + (bit_not @0)) + integer_zerop) + (if (!TYPE_UNSIGNED (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && INTEGRAL_TYPE_P (TREE_TYPE (@0))) + (eqne @0 { build_zero_cst (TREE_TYPE (@0)); } + /* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */ (simplify (bit_and (bit_ior @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c b/gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c new file mode 100644 index 000..e9400acf5f3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */ + +int foo1_1(int a) +{ +return ((a - 1) & ~a) < 0; +} + +int foo1_2(int a) +{ +return (((a - 1) & ~a) >> 31) != 0; +} + +int foo2_1(int a) +{ +return ((a - 1) & ~a) >= 0; +} + +int foo2_2(int a) +{ +return (((a - 1) & ~a) >> 31) == 0; +} + +/* { dg-final { scan-tree-dump-not "bit_and_expr, " "optimized" } } */ +/* { dg-final { scan-tree-dump-not "bit_not_expr, " "optimized" } } */ +/* { dg-final { scan-tree-dump-times "eq_expr, " 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "ne_expr, " 2 "optimized" } } */ + -- 2.43.0
[PATCH v2] RISC-V: Improve code generation for select of consecutive constants
Based on the valuable feedback I received, I decided to implement the patch in the RTL pipeline. Since a similar optimization already exists in simplify_binary_operation_1, I chose to generalize my original approach and place it directly below that code. The expression (X xor C1) + C2 is simplified to X xor (C1 xor C2) under the conditions described in the patch. This is a more general optimization, but it still applies to the RISC-V case, which was my initial goal: long f1(long x, long y) { return (x > y) ? 2 : 3; } Before the patch, the generated assembly is: f1(long, long): sgt a0,a0,a1 xoria0,a0,1 addia0,a0,2 ret After the patch, the generated assembly is: f1(long, long): sgt a0,a0,a1 xoria0,a0,3 ret The patch optimizes cases like x LT/GT y ? 2 : 3 (and x GE/LE y ? 3 : 2), as initially intended. Since this optimization is more general, I noticed it also optimizes cases like x < CONST ? 3 : 2 when CONST < 0. I’ve added tests for these cases as well. A bit of logic behind the patch: The equality A + B == A ^ B + 2 * (A & B) always holds true. This can be simplified to A ^ B if 2 * (A & B) == 0. In our case, we have A == X ^ C1, B == C2 and X is either 0 or 1. 2024-09-27 Jovan Vukic PR target/108038 gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New simplification. gcc/testsuite/ChangeLog: * gcc.target/riscv/slt-1.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/simplify-rtx.cc| 12 ++ gcc/testsuite/gcc.target/riscv/slt-1.c | 59 ++ 2 files changed, 71 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/slt-1.c diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index a20a61c5ddd..e8e60404ef6 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -2994,6 +2994,18 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, simplify_gen_binary (XOR, mode, op1, XEXP (op0, 1))); + /* (plus (xor X C1) C2) is (xor X (C1^C2)) if X is either 0 or 1 and +2 * ((X ^ C1) & C2) == 0; based on A + B == A ^ B + 2 * (A & B). */ + if (CONST_SCALAR_INT_P (op1) + && GET_CODE (op0) == XOR + && CONST_SCALAR_INT_P (XEXP (op0, 1)) + && nonzero_bits (XEXP (op0, 0), mode) == 1 + && 2 * (INTVAL (XEXP (op0, 1)) & INTVAL (op1)) == 0 + && 2 * ((1 ^ INTVAL (XEXP (op0, 1))) & INTVAL (op1)) == 0) + return simplify_gen_binary (XOR, mode, XEXP (op0, 0), + simplify_gen_binary (XOR, mode, op1, +XEXP (op0, 1))); + /* Canonicalize (plus (mult (neg B) C) A) to (minus A (mult B C)). */ if (!HONOR_SIGN_DEPENDENT_ROUNDING (mode) && GET_CODE (op0) == MULT diff --git a/gcc/testsuite/gcc.target/riscv/slt-1.c b/gcc/testsuite/gcc.target/riscv/slt-1.c new file mode 100644 index 000..29a64066081 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/slt-1.c @@ -0,0 +1,59 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */ + +#include + +#define COMPARISON(TYPE, OP, OPN, RESULT_TRUE, RESULT_FALSE) \ +TYPE test_##OPN(TYPE x, TYPE y) { \ +return (x OP y) ? RESULT_TRUE : RESULT_FALSE; \ +} + +/* Signed comparisons */ +COMPARISON(int64_t, >, GT1, 2, 3) +COMPARISON(int64_t, >, GT2, 5, 6) + +COMPARISON(int64_t, <, LT1, 2, 3) +COMPARISON(int64_t, <, LT2, 5, 6) + +COMPARISON(int64_t, >=, GE1, 3, 2) +COMPARISON(int64_t, >=, GE2, 6, 5) + +COMPARISON(int64_t, <=, LE1, 3, 2) +COMPARISON(int64_t, <=, LE2, 6, 5) + +/* Unsigned comparisons */ +COMPARISON(uint64_t, >, GTU1, 2, 3) +COMPARISON(uint64_t, >, GTU2, 5, 6) + +COMPARISON(uint64_t, <, LTU1, 2, 3) +COMPARISON(uint64_t, <, LTU2, 5, 6) + +COMPARISON(uint64_t, >=, GEU1, 3, 2) +COMPARISON(uint64_t, >=, GEU2, 6, 5) + +COMPARISON(uint64_t, <=, LEU1, 3, 2) +COMPARISON(uint64_t, <=, LEU2, 6, 5) + +#define COMPARISON_IMM(TYPE, OP, OPN, RESULT_TRUE, RESULT_FALSE) \ +TYPE testIMM_##OPN(TYPE x) { \ +return (x OP -3) ? RESULT_TRUE : RESULT_FALSE; \ +} + +/* Signed comparisons with immediate */ +COMPARISON
[PATCH v2] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`
Thank you for the feedback on the v1 patch. As requested, I added detailed tests for various signed and unsigned integer types in the test file bitops-11.c. I also included more complex expressions to observe how everything behaves at the GIMPLE level and added vector test examples as well. Since vector expressions are matched on (minus @0 1) instead of (plus @0 -1), I added a simplification for the minus case in match.pd. Additionally, I introduced simplifications for the expressions (a - 1) & -a, (a - 1) | -a, and (a - 1) ^ -a to 0, -1, and -1, respectively, in simplify-rtx.cc. For each of the three expressions, I added two if statements. The first matches the typical (BIT_OP (plus (A -1)) (neg A)), while the second recognizes the presence of a SUBREG within the RTL expression. For example, when A is of type short, the second if statement is triggered. I didn't observe any issues with match.pd missing any simplifications, but if that happens, the code in simplify-rtx.cc should help. Bootstrapped and tested on x86-linux-gnu with no regressions. 2024-11-26 Jovan Vukic gcc/ChangeLog: * match.pd: New pattern. * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New code to handle (a - 1) & -a, (a - 1) | -a and (a - 1) ^ -a. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-11.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/match.pd | 16 +++ gcc/simplify-rtx.cc | 87 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 116 ++ 3 files changed, 219 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c diff --git a/gcc/match.pd b/gcc/match.pd index 0ac5674f24b..c85d4b9ae6c 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1472,6 +1472,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_and:c @0 (bit_not (bit_xor:c @0 @1))) (bit_and @0 @1)) +/* Transform: + (a - 1) & -a -> 0. + (a - 1) | -a -> -1. + (a - 1) ^ -a -> -1. */ +(for bit_op (bit_ior bit_xor bit_and) + (simplify + (bit_op:c (plus @0 integer_minus_onep@1) (negate @0)) + (if (bit_op == BIT_AND_EXPR) +{ build_zero_cst (type); } +{ build_minus_one_cst (type); })) + (simplify + (bit_op:c (minus @0 integer_onep@1) (negate @0)) + (if (bit_op == BIT_AND_EXPR) +{ build_zero_cst (type); } +{ build_minus_one_cst (type); }))) + /* a & (a == b) --> a & b (boolean version of the above). */ (simplify (bit_and:c @0 (nop_convert? (eq:c @0 @1))) diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 893c5f6e1ae..514d21c6ef5 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -3530,6 +3530,35 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, && GET_MODE_CLASS (mode) != MODE_CC) return CONSTM1_RTX (mode); + /* (ior (plus (A -1)) (neg A)) is -1. */ + if (((GET_CODE (op1) == NEG + && GET_CODE (op0) == PLUS + && XEXP (op0, 1) == constm1_rtx) + || (GET_CODE (op0) == NEG + && GET_CODE (op1) == PLUS + && XEXP (op1, 1) == constm1_rtx)) + && rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0)) + && !side_effects_p (XEXP (op0, 0)) + && !side_effects_p (XEXP (op1, 0))) + return CONSTM1_RTX (mode); + + /* (ior (subreg (plus (A -1))) (subreg (neg A))) is -1. */ + if (GET_CODE (op0) == SUBREG + && GET_CODE (op1) == SUBREG + && subreg_lowpart_p (op0) + && subreg_lowpart_p (op1) + && ((GET_CODE (XEXP (op1, 0)) == NEG + && GET_CODE (XEXP (op0, 0)) == PLUS + && XEXP (XEXP (op0, 0), 1) == constm1_rtx) + || (GET_CODE (XEXP (op0, 0)) == NEG + && GET_CODE (XEXP (op1, 0)) == PLUS + && XEXP (XEXP (op1, 0), 1) == constm1_rtx)) + && rtx_equal_p (XEXP (XEXP (op0, 0), 0), + XEXP (XEXP (op1, 0), 0)) + && !side_effects_p (XEXP (XEXP (op0, 0), 0)) + && !side_effects_p (XEXP (XEXP (op1, 0), 0))) + return CONSTM1_RTX (mode); + /* (ior A C) is C if all bits of A that might be nonzero are on in C. */ if (CONST_INT_P (op1) && HWI_COMPUTABLE_MODE_P (mode) @@ -3691,6 +3720,35 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, & nonzero_bits (op1, mode)) == 0) return (simplify_gen_binary (IOR, mode, op0, op1)); + /* (xor (plus (A -1)) (neg A)) is -1. */ + if (((GET_CODE (op1) == NEG + && GET_CO
[PATCH v3] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`
Thank you for the feedback. I have made the minor changes that were requested. Additionally, I extracted the repetitive code into a reusable helper function, match_plus_neg_pattern, making the code much more readable. Furthermore, the logic, code, and tests remain the same as in version 2 of the patch. 2024-12-03 Jovan Vukic gcc/ChangeLog: * match.pd: New pattern. * simplify-rtx.cc (match_plus_neg_pattern): New helper function. (simplify_context::simplify_binary_operation_1): New code to handle (a - 1) & -a, (a - 1) | -a and (a - 1) ^ -a. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-11.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/match.pd | 16 +++ gcc/simplify-rtx.cc | 41 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 117 ++ 3 files changed, 174 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c diff --git a/gcc/match.pd b/gcc/match.pd index 2dd67b69cf1..35cc782f137 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1472,6 +1472,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_and:c @0 (bit_not (bit_xor:c @0 @1))) (bit_and @0 @1)) +/* Transform: + (a - 1) & -a -> 0. + (a - 1) | -a -> -1. + (a - 1) ^ -a -> -1. */ +(for bit_op (bit_ior bit_xor bit_and) + (simplify + (bit_op:c (plus @0 integer_minus_onep) (negate @0)) + (if (bit_op == BIT_AND_EXPR) +{ build_zero_cst (type); } +{ build_minus_one_cst (type); })) + (simplify + (bit_op:c (minus @0 integer_onep) (negate @0)) + (if (bit_op == BIT_AND_EXPR) +{ build_zero_cst (type); } +{ build_minus_one_cst (type); }))) + /* a & (a == b) --> a & b (boolean version of the above). */ (simplify (bit_and:c @0 (nop_convert? (eq:c @0 @1))) diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 86b3f331928..223a0095978 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -2941,6 +2941,35 @@ simplify_rotate_op (rtx op0, rtx op1, machine_mode mode) return NULL_RTX; } +/* Returns true if OP0 and OP1 match the pattern (OP (plus (A - 1)) (neg A)), + and the pattern can be simplified (there are no side effects). */ + +static bool +match_plus_neg_pattern (rtx op0, rtx op1, machine_mode mode) +{ + /* Remove SUBREG from OP0 and OP1, if needed. */ + if (GET_CODE (op0) == SUBREG + && GET_CODE (op1) == SUBREG + && subreg_lowpart_p (op0) + && subreg_lowpart_p (op1)) +{ + op0 = XEXP (op0, 0); + op1 = XEXP (op1, 0); +} + + /* Check for the pattern (OP (plus (A - 1)) (neg A)). */ + if (((GET_CODE (op1) == NEG + && GET_CODE (op0) == PLUS + && XEXP (op0, 1) == CONSTM1_RTX (mode)) + || (GET_CODE (op0) == NEG + && GET_CODE (op1) == PLUS + && XEXP (op1, 1) == CONSTM1_RTX (mode))) + && rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0)) + && !side_effects_p (XEXP (op0, 0))) +return true; + return false; +} + /* Subroutine of simplify_binary_operation. Simplify a binary operation CODE with result mode MODE, operating on OP0 and OP1. If OP0 and/or OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the @@ -3553,6 +3582,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, && GET_MODE_CLASS (mode) != MODE_CC) return CONSTM1_RTX (mode); + /* Convert (ior (plus (A - 1)) (neg A)) to -1. */ + if (match_plus_neg_pattern (op0, op1, mode)) + return CONSTM1_RTX (mode); + /* (ior A C) is C if all bits of A that might be nonzero are on in C. */ if (CONST_INT_P (op1) && HWI_COMPUTABLE_MODE_P (mode) @@ -3714,6 +3747,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, & nonzero_bits (op1, mode)) == 0) return (simplify_gen_binary (IOR, mode, op0, op1)); + /* Convert (xor (plus (A - 1)) (neg A)) to -1. */ + if (match_plus_neg_pattern (op0, op1, mode)) + return CONSTM1_RTX (mode); + /* Convert (XOR (NOT x) (NOT y)) to (XOR x y). Also convert (XOR (NOT x) y) to (NOT (XOR x y)), similarly for (NOT y). */ @@ -3981,6 +4018,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, && GET_MODE_CLASS (mode) != MODE_CC) return CONST0_RTX (mode); + /* Convert (and (plus (A - 1)) (neg A)) to 0. */ + if (match_plus_neg_pattern (op0, op1, mode)) + return CONST0_RTX (mode); + /* Transform (and (extend X) C) into (zero_extend