[PATCH] RISC-V: Optimize branches with shifted immediate operands

2024-09-02 Thread Jovan Vukic
The patch adds a new instruction pattern to handle conditional branches with 
equality checks between shifted arithmetic operands. This pattern optimizes the 
use of shifted constants (with trailing zeros), making it more efficient.

For the C code:
void f5(long long a) {
  if ((a & 0x212) == 0x200)
g();
}

before the patch, the assembly code was:
f5:
  lia5,34734080
  and   a0,a0,a5
  lia5,33554432
  beq   a0,a5,.L21
  ret

and after the patch the assembly is:
f5:
  srli  a5,a0,17
  andi  a5,a5,265
  lia4,256
  beq   a5,a4,.L21
  ret

Tested on both RV32 and RV64 with no regressions.

2024-09-02  Jovan Vukic  

gcc/ChangeLog:
  PR target/113248
  * config/riscv/riscv.md (*branch_shiftedarith_equals_shifted): 
New pattern.

gcc/testsuite/ChangeLog:
  PR target/113248
  * gcc.target/riscv/branch-1.c: Additional tests.

---
 gcc/config/riscv/riscv.md | 32 +++
 gcc/testsuite/gcc.target/riscv/branch-1.c | 16 +---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3289ed2155a..c98a66dbc7c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3126,6 +3126,38 @@
 }
 [(set_attr "type" "branch")])

+(define_insn_and_split "*branch_shiftedarith_equals_shifted"
+  [(set (pc)
+ (if_then_else (match_operator 1 "equality_operator"
+      [(and:ANYI (match_operand:ANYI 2 "register_operand" "r")
+     (match_operand 3 "shifted_const_arith_operand" "i"))
+ (match_operand 4 "shifted_const_arith_operand" "i")])
+  (label_ref (match_operand 0 "" ""))
+  (pc)))
+   (clobber (match_scratch:X 5 "=&r"))
+   (clobber (match_scratch:X 6 "=&r"))]
+  "!SMALL_OPERAND (INTVAL (operands[3]))
+&& !SMALL_OPERAND (INTVAL (operands[4]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5) (lshiftrt:X (subreg:X (match_dup 2) 0) (match_dup 8)))
+   (set (match_dup 5) (and:X (match_dup 5) (match_dup 9)))
+   (set (match_dup 6) (match_dup 10))
+   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 5) (match_dup 6)])
+    (label_ref (match_dup 0)) (pc)))]
+{
+  HOST_WIDE_INT mask1 = INTVAL (operands[3]);
+  HOST_WIDE_INT mask2 = INTVAL (operands[4]);
+  int trailing = (ctz_hwi (mask1) > ctz_hwi (mask2))
+   ? ctz_hwi (mask2)
+   : ctz_hwi (mask1);
+
+  operands[8] = GEN_INT (trailing);
+  operands[9] = GEN_INT (mask1 >> trailing);
+  operands[10] = GEN_INT (mask2 >> trailing);
+}
+[(set_attr "type" "branch")])
+
 (define_insn_and_split "*branch_shiftedmask_equals_zero"
   [(set (pc)
  (if_then_else (match_operator 1 "equality_operator"
diff --git a/gcc/testsuite/gcc.target/riscv/branch-1.c 
b/gcc/testsuite/gcc.target/riscv/branch-1.c
index b4a3a946379..e09328fe705 100644
--- a/gcc/testsuite/gcc.target/riscv/branch-1.c
+++ b/gcc/testsuite/gcc.target/riscv/branch-1.c
@@ -28,10 +28,20 @@ void f4(long long a)
 g();
 }

+void f5(long long a) {
+  if ((a & 0x212) == 0x200)
+g();
+}
+
+void f6(long long a) {
+  if ((a & 0x7000) == 0x3000)
+g();
+}
+
 /* { dg-final { scan-assembler-times "slli\t" 2 } } */
-/* { dg-final { scan-assembler-times "srli\t" 3 } } */
-/* { dg-final { scan-assembler-times "andi\t" 1 } } */
-/* { dg-final { scan-assembler-times "\tli\t" 1 } } */
+/* { dg-final { scan-assembler-times "srli\t" 5 } } */
+/* { dg-final { scan-assembler-times "andi\t" 3 } } */
+/* { dg-final { scan-assembler-times "\tli\t" 3 } } */
 /* { dg-final { scan-assembler-not "addi\t" } } */
 /* { dg-final { scan-assembler-not "and\t" } } */

--
2.43.0
CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.


Re: [PATCH] RISC-V: Optimize branches with shifted immediate operands

2024-09-05 Thread Jovan Vukic
> It's worth noting there is a newer way which is usually slightly simpler
> than a match_operator. Specifically code iterators.

Thank you for the very detailed feedback. It is not a problem to add code 
iterators. I would add iterators for "eq" and "ne" in riscv/iterators.md since 
they don't currently exist:

> (define_code_iterator any_eq [eq ne])

I would also add new  values for "eq" and "ne". I assume it would be 
best to submit the patch again as version 2 with these changes.

> The pattern uses shifted_const_arith_operand, which is good as it
> validates that the constant, if normalized by shifting away its trailing
> zeros fits in a simm12.
>
> But the normalization you're doing on the two constants is limited by
> the smaller of trailing zero counts.  So operands2 might be 0x8100 which
> requires an 8 bit shift for normalization.  operands3 might be 0x81000
> which requires a 12 bit shift for normalization.  In that case we'll use
> 8 as our shift count for normalization, resulting in:
>
> 0x8100 >> 8 = 0x81, a valid small operand
> 0x81000 >> 8 = 0x810, not a valid small operand.
>
>
> I think that'll generate invalid RTL at split time.
>
> What I think you need to do is in the main predicate (the same place
> you're currently !SMALL_OPERAND (INTVAL (operands[3]))), you'll need to
> check that both operands are SMALL_OPERAND after normalization.

Regarding the second issue, thanks again for the clear explanation. While at 
first this might seem like a problem, I believe these cases won't actually be a 
problem.

The comparisons you mentioned, (x & 0x81000) == 0x8100 and (x & 0x8100) == 
0x81000, will always evaluate as false, and this pattern will never be used for 
them (https://godbolt.org/z/Y11EGMb4f).

Even in general, I'm quite sure we will never encounter an operand, after 
shifting, greater than 2^11 (i.e. not a SMALL_OPERAND). I will provide my 
reasoning below, but if you find it incorrect, or have any counterexamples, I 
would be happy to make the requested changes, add the mentioned check and 
submit that as PATCH v2.

Lets consider the general expression (x & c1) == c2, where t1 and t2 represent 
the counts of trailing zeros in each constant. There are three cases to 
consider:
1. When t1 == t2: The pattern works fine, with no edge cases.
2. When t1 > t2: The expression will always evaluate as false, and the pattern 
won’t even be considered. For example, (x & 0x81000) == 0x8100.
3. When t1 < t2: In this case:
   - c1 must be of the form 0x0KLM00 (where the highest bit of K cannot be set) 
to meet the shifted_const_arith_operand constraint, ensuring 
SMALL_OPERAND(0x0KLM) == true (i.e. 0x0KLM < 2^11).
   - To prevent the expression from immediately evaluating as false, c2 must be 
in the form 0x0PQ<0bxxx0>00, where this value has to have only 0 or 1 in bit 
positions where c1 has 1 (and 0 elsewhere). Otherwise, (x & c1) == c2 would 
instantly be false, so this pattern wouldn’t be used. Lets call this "the 
critical condition".
   - After shifting c1 and c2, we have c1 == 0xKLM and c2 == 0xPQ<0bxxx0>, 
assuming the LSB of M is set to 1.
   - Due to "the critical condition", c2 == 0xPQ<0bxxx0> cannot have the 
highest bit of P set to 1. Otherwise, (x & c1) == c2 would immediately evaluate 
as false, since 0xKLM is guaranteed not to have the highest bit of K set to 1. 
This guarantees that SMALL_OPERAND(0xPQ<0bxxx0>) will always be true (i.e. c2 < 
2^11).
CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.


[PATCH] RISC-V: Improve code generation for select of consecutive constants

2024-09-17 Thread Jovan Vukic
The patch optimizes code generated for comparisons of the form x > y ? 2 : 3
(x <= y ? 3 : 2) and x < y ? 2 : 3 (x >= y ? 3 : 2).

For the following C code:

long f1(long x, long y) {
return (x > y) ? 2 : 3;
}

long f2(long x, long y) {
return (x < y) ? 2 : 3;
}


Before the patch, the generated assembly is:

f1(long, long):
sgt a0, a0, a1
xoria0, a0, 1
addia0, a0, 2
ret

f2(long, long):
slt a0, a0, a1
xoria0, a0, 1
addia0, a0, 2
ret


After the patch, the generated assembly is:

f1(long, long):
slt a0,a1,a0
xoria0,a0,3
ret

f2(long, long):
slt a0,a0,a1
xoria0,a0,3
ret


If we only consider the first example, during the combine pass, it can match
the "le:" pattern (x <= y), but that can only be replaced by the inverse
operation x > y (y < x), which requires replacing the following addi
instruction with an xori instruction as well, as proposed by the patch.

Tested on both RV32 and RV64 with no regressions.

2024-09-17  Jovan Vukic  

PR target/108038

gcc/ChangeLog:

* config/riscv/iterators.md (paired_lt): New code attributes.
* config/riscv/riscv.md (*sle__xor): New pattern.
(*sge__xor): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/slt-1.c: New test.

CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/iterators.md  |  4 +++
 gcc/config/riscv/riscv.md  | 32 +++
 gcc/testsuite/gcc.target/riscv/slt-1.c | 44 ++
 3 files changed, 80 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/slt-1.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 2844cb02ff0..964d0f24f4c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -234,6 +234,10 @@
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
 
+(define_code_attr paired_lt [(gt "lt") (gtu "ltu")
+(ge "lt") (geu "ltu")
+(le "lt") (leu "ltu")])
+
 ; atomics code iterator
 (define_code_iterator any_atomic [plus ior xor and])
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 9f94b5aa023..b22a8ac05e0 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3600,6 +3600,38 @@
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
+(define_insn_and_split "*sle__xor"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (plus:GPR (any_le:GPR (match_operand:X 1 "register_operand" " r")
+  (match_operand:X 2 "register_operand" " r"))
+  (match_operand 3 "const_arith_operand" "i")))]
+  "INTVAL (operands[3]) % 2 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (:GPR (match_dup 2) (match_dup 1)))
+   (set (match_dup 0) (xor:GPR (match_dup 0) (match_dup 3)))]
+{
+  operands[3] = GEN_INT (INTVAL (operands[3]) + 1);
+}
+  [(set_attr "type" "slt")
+   (set_attr "mode" "")])
+
+(define_insn_and_split "*sge__xor"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (plus:GPR (any_ge:GPR (match_operand:X 1 "register_operand" " r")
+  (match_operand:X 2 "register_operand" " r"))
+  (match_operand 3 "const_arith_operand" "i")))]
+  "INTVAL (operands[3]) % 2 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (:GPR (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (xor:GPR (match_dup 0) (match_dup 3)))]
+{
+  operands[3] = GEN_INT (INTVAL (operands[3]) + 1);
+}
+  [(set_attr "type" "slt")
+   (set_attr "mode" "")])
+
 ;;
 ;;  
 ;;
diff --git a/gcc/testsuite/gcc.target/riscv/slt-1.c 
b/gcc/testsuite/gcc.target/riscv/slt-1.c
new file mode 100644
index 000..c672da92acf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/slt-1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+#include 
+

[PATCH v3] RISC-V: Optimize branches with shifted immediate operands

2024-10-09 Thread Jovan Vukic
After the valuable feedback I received, it’s clear to me that the
oversight was in the tests showing the benefits of the patch. In the
test file, I added functions f5 and f6, which now generate more
efficient code with fewer instructions.

Before the patch:

f5:
li  a4,2097152
addia4,a4,-2048
li  a5,1167360
and a0,a0,a4
addia5,a5,-2048
beq a0,a5,.L4

f6:
li  a5,3407872
addia5,a5,-2048
and a0,a0,a5
li  a5,1114112
beq a0,a5,.L7

After the patch:

f5:
srlia5,a0,11
andia5,a5,1023
li  a4,569
beq a5,a4,.L5

f6:
srlia5,a0,11
andia5,a5,1663
li  a4,544
beq a5,a4,.L9

2024-10-09  Jovan Vukic  

PR target/115921

gcc/ChangeLog:

* config/riscv/iterators.md (any_eq): New code iterator.
* config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro.
(SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto.
* config/riscv/riscv.md 
(*branch_shiftedarith__shifted):
New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/branch-1.c: Additional tests.

CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/iterators.md |  4 +++
 gcc/config/riscv/riscv.h  | 12 +
 gcc/config/riscv/riscv.md | 32 +++
 gcc/testsuite/gcc.target/riscv/branch-1.c | 18 ++---
 4 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 872c542e906..081659499a9 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -233,6 +233,8 @@
 (define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
+(define_code_iterator any_eq [eq ne])
+
 ;; Iterators for conditions we can emit a sCC against 0 or a reg directly
 (define_code_iterator scc_0  [eq ne gt gtu])
 
@@ -285,6 +287,8 @@
 (le "le")
 (gt "gt")
 (lt "lt")
+(eq "eq")
+(ne "ne")
 (ior "ior")
 (xor "xor")
 (and "and")
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 53b7b2a40ed..ca1b8329cdc 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -667,6 +667,18 @@ enum reg_class
 /* True if bit BIT is set in VALUE.  */
 #define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0)
 
+/* Returns the smaller (common) number of trailing zeros for VAL1 and VAL2.  */
+#define COMMON_TRAILING_ZEROS(VAL1, VAL2)  \
+  (ctz_hwi (VAL1) < ctz_hwi (VAL2) \
+   ? ctz_hwi (VAL1)\
+   : ctz_hwi (VAL2))
+
+/* Returns true if both VAL1 and VAL2 are SMALL_OPERANDs after shifting by
+   the common number of trailing zeros.  */
+#define SMALL_AFTER_COMMON_TRAILING_SHIFT(VAL1, VAL2)  \
+  (SMALL_OPERAND ((VAL1) >> COMMON_TRAILING_ZEROS (VAL1, VAL2))
\
+   && SMALL_OPERAND ((VAL2) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)))
+
 /* Stack layout; function entry, exit and calling.  */
 
 #define STACK_GROWS_DOWNWARD 1
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 688c07df46c..78112afbb26 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3129,6 +3129,38 @@
 }
 [(set_attr "type" "branch")])
 
+(define_insn_and_split "*branch_shiftedarith__shifted"
+  [(set (pc)
+   (if_then_else (any_eq
+   (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+ (match_operand 2 "shifted_const_arith_operand" "i"))
+   (match_operand 3 "shifted_const_arith_operand" "i"))
+(label_ref (match_operand 0 "" ""))
+(pc)))
+   (clobber (match_scratch:X 4 "=&r"))
+   (clobber (match_scratch:X 5 "=&r"))]
+  "!SMALL_OPERAND (INTVAL (operands[2]))
+&& !SMALL_OPERAND (INTVAL (operands[3]))
+&& SMALL_AFTER_COMMON_TRAILING_SHIFT (INTVAL (operands[2]),
+ 

[PATCH v2] RISC-V: Optimize branches with shifted immediate operands

2024-09-29 Thread Jovan Vukic
Based on the feedback I received, the patch now uses a custom code iterator
instead of relying on match_operator.

Since both operands (2 and 3) have trailing zeros, an additional check
was introduced to verify if they remain SMALL_OPERANDs after shifting by
the smaller number of trailing zeros.

To avoid complex inline check, I introduced two new macros, ensuring that the
check is encapsulated and reusable, as suggested in the original feedback.

It was pointed out that this check should be added because expressions like
(x & 0x81000) == 0x8100 or (x & 0x8100) == 0x81000 might lead to
unrecognized instructions during splitting. However, both expressions
immediately evaluate to false (https://godbolt.org/z/Y11EGMb4f) due to
the bit arrangement in constants 0x8100 and 0x81000.

As a result, this pattern won't be applied in such cases, so it cannot
cause any issues. Therefore, it wasn't necessary to include this as a
negative test in the testsuite.

Despite my efforts, I couldn't find another example for a negative test.
All similar cases evaluated to false immediately due to the bit arrangement.
I mentioned this in a follow-up comment on the previous patch version.
Regardless, the necessary check has been added, so if a negative case
exists, an unrecognized instruction will not be created.

2024-09-30  Jovan Vukic  

PR target/115921

gcc/ChangeLog:

* config/riscv/iterators.md (any_eq): New code iterator.
* config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro.
(SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto.
* config/riscv/riscv.md 
(*branch_shiftedarith__shifted):
New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/branch-1.c: Additional tests.

CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/iterators.md |  3 +++
 gcc/config/riscv/riscv.h  | 12 +
 gcc/config/riscv/riscv.md | 32 +++
 gcc/testsuite/gcc.target/riscv/branch-1.c | 16 +---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 2844cb02ff0..288f11464a2 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -233,6 +233,7 @@
 (define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
+(define_code_iterator any_eq [eq ne])
 
 ; atomics code iterator
 (define_code_iterator any_atomic [plus ior xor and])
@@ -283,6 +284,8 @@
 (le "le")
 (gt "gt")
 (lt "lt")
+(eq "eq")
+(ne "ne")
 (ior "ior")
 (xor "xor")
 (and "and")
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 3aecb43f831..c782499b2bf 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -667,6 +667,18 @@ enum reg_class
 /* True if bit BIT is set in VALUE.  */
 #define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0)
 
+/* Returns the smaller (common) number of trailing zeros for VAL1 and VAL2.  */
+#define COMMON_TRAILING_ZEROS(VAL1, VAL2)  \
+  (ctz_hwi (VAL1) < ctz_hwi (VAL2) \
+   ? ctz_hwi (VAL1)\
+   : ctz_hwi (VAL2))
+
+/* Returns true if both VAL1 and VAL2 are SMALL_OPERANDs after shifting by
+   the common number of trailing zeros.  */
+#define SMALL_AFTER_COMMON_TRAILING_SHIFT(VAL1, VAL2)  \
+  (SMALL_OPERAND ((VAL1) >> COMMON_TRAILING_ZEROS (VAL1, VAL2))
\
+   && SMALL_OPERAND ((VAL2) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)))
+
 /* Stack layout; function entry, exit and calling.  */
 
 #define STACK_GROWS_DOWNWARD 1
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 0410d990ec5..afc534d050f 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3129,6 +3129,38 @@
 }
 [(set_attr "type" "branch")])
 
+(define_insn_and_split "*branch_shiftedarith__shifted"
+  [(set (pc)
+   (if_then_else (any_eq
+   (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+ (match_operand 2 "shifted_const_arith_operand&qu

[PATCH] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"

2024-10-22 Thread Jovan Vukic
Currently, within the phiopt pass, under value_replacement, we have the
option to replace the expression "(cond & (a == b)) ? a : b" with "b". It
checks whether there is a BIT_AND_EXPR and verifies if one of the operands
contains the expression "a == b".

However, the same result should also apply to the expression with the
inverted condition "(cond | (a != b)) ? b : a," so this patch makes slight
modifications to the existing code to cover this situation as well. The
patch adds a check for whether there is a BIT_IOR_EXPR and if it contains
"a != b".

I added a test for both optimizations, as I could not find an adequate test
in the testsuite for the first one (even though that optimization exists).

For example, for this C code:

int foo(int a, int b, int c) {
return ((a > c) | (a != b)) ? b : a;
}

Before the patch, the generated assembly for RISC-V is:

foo:
bgt a0, a2, .L4
bne a0, a1, .L4
ret
.L4:
mv  a0, a1
ret

After the patch, the generated assembly becomes:

foo:
    mv  a0, a1
ret

2024-10-22  Jovan Vukic  

gcc/ChangeLog:

* tree-ssa-phiopt.cc
(rhs_is_fed_for_value_replacement): Add a new optimization opportunity
for BIT_IOR_EXPR and a != b.
(operand_equal_for_value_replacement): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-same-3.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 .../gcc.dg/tree-ssa/phi-opt-same-3.c  | 17 +++
 gcc/tree-ssa-phiopt.cc| 48 ---
 2 files changed, 47 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c
new file mode 100644
index 000..f9838f177a1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt2 -fdump-tree-optimized" } */
+
+int f1(int a, int b, int c)
+{
+  if ((a > c) & (a == b)) return a;
+  return b;
+}
+
+int f2(int a, int b, int c)
+{
+  if ((a > c) | (a != b)) return b;
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-not "if " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 38cc8506d1f..956fe919cc0 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1078,17 +1078,18 @@ jump_function_from_stmt (tree *arg, gimple *stmt)
   return false;
 }
 
-/* RHS is a source argument in a BIT_AND_EXPR which feeds a conditional
-   of the form SSA_NAME NE 0.
+/* RHS is a source argument in a BIT_AND_EXPR or BIT_IOR_EXPR which feeds
+   a conditional of the form SSA_NAME NE 0.
 
-   If RHS is fed by a simple EQ_EXPR comparison of two values, see if
-   the two input values of the EQ_EXPR match arg0 and arg1.
+   If RHS is fed by a simple EQ_EXPR or NE_EXPR comparison of two values,
+   see if the two input values of the comparison match arg0 and arg1.
 
If so update *code and return TRUE.  Otherwise return FALSE.  */
 
 static bool
 rhs_is_fed_for_value_replacement (const_tree arg0, const_tree arg1,
-  enum tree_code *code, const_tree rhs)
+  enum tree_code *code, const_tree rhs,
+  enum tree_code bit_expression_code)
 {
   /* Obviously if RHS is not an SSA_NAME, we can't look at the defining
  statement.  */
@@ -1096,11 +1097,15 @@ rhs_is_fed_for_value_replacement (const_tree arg0, const_tree arg1,
 {
   gimple *def1 = SSA_NAME_DEF_STMT (rhs);
 
-  /* Verify the defining statement has an EQ_EXPR on the RHS.  */
-  if (is_gimple_assign (def1) && gimple_assign_rhs_code (def1) == EQ_EXPR)
+  /* Verify the defining statement has an EQ_EXPR or NE_EXPR on the RHS.  */
+  if (is_gimple_assign (def1) &&
+	  ((bit_expression_code == BIT_AND_EXPR &&
+	gimple_assign_rhs_code (def1) == EQ_EXPR) ||
+	   (bit_expression_code == BIT_IOR_EXPR &&
+	gimple_assign_rhs_code (def1) == NE_EXPR)))
 	{
-	  /* Finally verify the source operands of the EQ_EXPR are equal
-	 to arg0 and arg1.  */
+	  /* Finally verify the source operands of the EQ_EXPR or NE_EXPR
+	 are equal to arg0 and arg1.  */
 	  tree op0 = gimple_assign_rhs1 (def1);
 	  tree op1 = gimple_assign_rhs2 (def1);
 	  if ((operand_equal_for_phi_arg_p

[PATCH v2] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"

2024-10-30 Thread Jovan Vukic
Thanks for the feedback on the first version of the patch. Accordingly:

I have corrected the code formatting as requested. I added new tests to
the existing file phi-opt-11.c, instead of creating a new one.

I performed testing before and after applying the patch on the x86
architecture, and I confirm that there are no new regressions.

The logic and general code of the patch itself have not been changed.

> So the A EQ/NE B expression, we can reverse A and B in the expression
> and still get the same result. But don't we have to be more careful for
> the TRUE/FALSE arms of the ternary? For BIT_AND we need ? a : b for
> BIT_IOR we need ? b : a.
>
> I don't see that gets verified in the existing code or after your
> change. I suspect I'm just missing something here. Can you clarify how
> we verify that BIT_AND gets ? a : b for the true/false arms and that
> BIT_IOR gets ? b : a for the true/false arms?

I did not communicate this clearly last time, but the existing optimization
simplifies the expression "(cond & (a == b)) ? a : b" to the simpler "b".
Similarly, the expression "(cond & (a == b)) ? b : a" simplifies to "a".

Thus, the existing and my optimization perform the following
simplifications:

(cond & (a == b)) ? a : b -> b
(cond & (a == b)) ? b : a -> a
(cond | (a != b)) ? a : b -> a
(cond | (a != b)) ? b : a -> b

For this reason, for BIT_AND_EXPR when we have A EQ B, it is sufficient to
confirm that one operand matches the true/false arm and the other matches
the false/true arm. In both cases, we simplify the expression to the third
operand of the ternary operation (i.e., OP0 ? OP1 : OP2 simplifies to OP2).
This is achieved in the value_replacement function after successfully
setting the value of *code within the rhs_is_fed_for_value_replacement
function to EQ_EXPR.

For BIT_IOR_EXPR, the same check is performed for A NE B, except now
*code remains NE_EXPR, and then value_replacement returns the second
operand (i.e., OP0 ? OP1 : OP2 simplifies to OP1).

2024-10-30  Jovan Vukic  

gcc/ChangeLog:

* tree-ssa-phiopt.cc
(rhs_is_fed_for_value_replacement): Add a new optimization opportunity
for BIT_IOR_EXPR and a != b.
(operand_equal_for_value_replacement): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-11.c: Add more tests.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c | 31 +-
 gcc/tree-ssa-phiopt.cc | 48 ++
 2 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
index 14c82cd5216..d1e284c5325 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-optimized --param 
logical-op-non-short-circuit=1" } */
+/* { dg-options "-O1 -fdump-tree-phiopt2 -fdump-tree-optimized --param 
logical-op-non-short-circuit=1" } */
 
 int f(int a, int b, int c)
 {
@@ -22,4 +22,33 @@ int h(int a, int b, int c, int d)
  return a;
 }
 
+int i(int a, int b, int c)
+{
+  if ((a > c) & (a == b))
+return a;
+  return b;
+}
+
+int j(int a, int b, int c)
+{
+  if ((a > c) & (a == b))
+return b;
+  return a;
+}
+
+int k(int a, int b, int c)
+{
+  if ((a > c) | (a != b))
+return b;
+  return a;
+}
+
+int l(int a, int b, int c)
+{
+  if ((a > c) | (a != b))
+return a;
+  return b;
+}
+
+/* { dg-final { scan-tree-dump-times "if" 0 "phiopt2" } } */
 /* { dg-final { scan-tree-dump-times "if" 0 "optimized" } } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index cffafe101a4..61b33bfc361 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1078,17 +1078,18 @@ jump_function_from_stmt (tree *arg, gimple *stmt)
   return false;
 }
 
-/* RHS is a source argument in a BIT_AND_EXPR which feeds a conditional
-   of the form SSA_NAME NE 0.
+/* RHS is a source argument in a BIT_AND_EXPR or BIT_IOR_EXPR which feeds
+   a conditional of the form SSA_NAME NE 0.
 
-   If RHS is fed by a simple EQ_EXPR comparison of two values, see if
-   the two input values of the EQ_EXPR match arg0 and arg1.
+   If RHS is fed by a simple EQ_EXPR or NE_EXPR comparison of two values,
+   see if the two input values of

Re: [PATCH v2] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"

2024-11-04 Thread Jovan Vukic
On 11/02/24, Jeff Law wrote:
>This is well understood.  The key in my mind is that for AND we always
>select the FALSE arm.  For IOR we always select the TRUE arm.

Yes, I agree.

>>   e = (code == NE_EXPR ? true_edge : false_edge);
>If I understand everything correctly your assertion is that we'll only
>get here for AND/EQ_EXPR and IOR/NE_EXPR.  There's no way to get here
>for AND/NE_EXPR or IOR/EQ_EXPR?

If we examine the patch step by step, we can see that the function
rhs_is_fed_for_value_replacement enters the if block exclusively for
the combinations BIT_AND_EXPR/EQ_EXPR and BIT_IOR_EXPR/NE_EXPR. It is
only at this point that it returns true and sets the value of *code. This is
evident in the code:

>  if (TREE_CODE (rhs) == SSA_NAME)
>{
>  ...
>  if (is_gimple_assign (def1)
> && ((bit_expression_code == BIT_AND_EXPR
>  && gimple_assign_rhs_code (def1) == EQ_EXPR)
> || (bit_expression_code == BIT_IOR_EXPR
> && gimple_assign_rhs_code (def1) == NE_EXPR)))
>   {
> ...
> *code = gimple_assign_rhs_code (def1);
> return true;
> ...
>   }
>}
>  return false;

The function operand_equal_for_value_replacement only passes the value
it receives from rhs_is_fed_for_value_replacement to the call site.

At the call site, the returned value initializes the variable equal_p.

Thus, the return value is simply passed from
rhs_is_fed_for_value_replacement to the variable equal_p, and it is
true only in the situation of BIT_AND_EXPR/EQ_EXPR or
BIT_IOR_EXPR/NE_EXPR because that is how it is defined in
rhs_is_fed_for_value_replacement.

The patch only sets equal_p to true for the combinations AND/EQ and
OR/NE. These are the only two situations when equal_p becomes true as
a result of the patch, and therefore, only in those situations, as a
consequence of the patch, does the line of code you quoted execute:

> if (equal_p || maybe_equal_p)
> {
>   ...
>   e = (code == NE_EXPR ? true_edge : false_edge);
>   ...
>   if (...)
>   {
>   ...
>   if (equal_p)
>   {
>   ...
>   return 2;
>   }
>   }
>   else if (equal_p)
>   {
>   ...
>   return 1;
>   }
> }

Also, Mr. Pinski left a comment
(https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667258.html)
and offered some suggestions about the patch.

He also mentioned that he is working on integrating the affected code into
match-and-simplify, so the rewrite is on the way. We can either move forward
with this patch or stop it. Either way, I’m fine with any decision made.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.


match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

2024-11-13 Thread Jovan Vukic
The patch simplifies expressions (a - 1) & -a, (a - 1) | -a, and (a - 1) ^ -a
to the constants 0, -1, and -1, respectively.

Currently, GCC does not perform these simplifications.

Bootstrapped and tested on x86-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd: New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-11.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/match.pd  | 11 +++
 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 23 +++
 2 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c10bf9a7b80..6d95ceaa194 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1472,6 +1472,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_and:c @0 (bit_not (bit_xor:c @0 @1)))
  (bit_and @0 @1))
 
+/* Transform:
+   (a - 1) & -a ->  0.
+   (a - 1) | -a -> -1.
+   (a - 1) ^ -a -> -1.  */
+(for bit_op (bit_ior bit_xor bit_and)
+ (simplify
+  (bit_op:c (plus @0 integer_minus_onep) (negate @0))
+  (if (bit_op == BIT_AND_EXPR)
+{ build_zero_cst (type); }
+{ build_minus_one_cst (type); })))
+
 /* a & (a == b)  -->  a & b (boolean version of the above). */
 (simplify
  (bit_and:c @0 (nop_convert? (eq:c @0 @1)))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c b/gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c
new file mode 100644
index 000..c2d646e2b8d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+
+int foo1(int a)
+{
+return (a - 1) & -a;
+}
+
+int foo2(int a)
+{
+return (a - 1) | -a;
+}
+
+int foo3(int a)
+{
+return (a - 1) ^ -a;
+}
+
+/* { dg-final { scan-tree-dump-not   "bit_and_expr, "   "optimized" } } */
+/* { dg-final { scan-tree-dump-not   "bit_ior_expr, "   "optimized" } } */
+/* { dg-final { scan-tree-dump-not   "bit_xor_expr, "   "optimized" } } */
+/* { dg-final { scan-tree-dump-not   "negate_expr, "   "optimized" } } */
+
-- 
2.43.0



match.pd: Add pattern to simplify `((X - 1) & ~X) < 0` to `X == 0`

2024-11-13 Thread Jovan Vukic
The patch makes the following simplifications:
((X - 1) & ~X) < 0 -> X == 0
((X - 1) & ~X) >= 0 -> X != 0

On x86, the number of instructions is reduced from 4 to 3,
but on platforms like RISC-V, it reduces to a single instruction.

Bootstrapped and tested on x86-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd: New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
The patch makes the following simplifications:
((X - 1) & ~X) < 0 -> X == 0
((X - 1) & ~X) >= 0 -> X != 0

On x86, the number of instructions is reduced from 4 to 3,
but on platforms like RISC-V, it reduces to a single instruction.

Bootstrapped and tested on x86-linux-gnu with no regressions.

gcc/ChangeLog:

	* match.pd: New pattern.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c: New test.

---
 gcc/match.pd  | 14 ++
 .../gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c   | 28 +++
 2 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 6d95ceaa194..1a92acda8ab 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2245,6 +2245,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (bitwise_equal_p (@0, @3))
(convert (bit_ior (bit_and @1 (convert @2)) (convert @0))
 
+/* Fold ((X - 1) & ~X) = 0 into X ==/!= 0.  */
+(for cmp (lt ge)
+ eqne (eq ne)
+ (simplify
+ (cmp:c
+   (bit_and:c
+ (plus @0 integer_minus_onep)
+ (bit_not @0))
+   integer_zerop)
+ (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
+  && INTEGRAL_TYPE_P (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+  (eqne @0 { build_zero_cst (TREE_TYPE (@0)); }
+
 /* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
 (simplify
   (bit_and (bit_ior @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c b/gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c
new file mode 100644
index 000..e9400acf5f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold_neg_and_zero_cmp.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+
+int foo1_1(int a)
+{
+return ((a - 1) & ~a) < 0;
+}
+
+int foo1_2(int a)
+{
+return (((a - 1) & ~a) >> 31) != 0;
+}
+
+int foo2_1(int a)
+{
+return ((a - 1) & ~a) >= 0;
+}
+
+int foo2_2(int a)
+{
+return (((a - 1) & ~a) >> 31) == 0;
+}
+
+/* { dg-final { scan-tree-dump-not   "bit_and_expr, "   "optimized" } } */
+/* { dg-final { scan-tree-dump-not   "bit_not_expr, "   "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "eq_expr, "  2  "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "ne_expr, "  2  "optimized" } } */
+
-- 
2.43.0



[PATCH v2] RISC-V: Improve code generation for select of consecutive constants

2024-09-27 Thread Jovan Vukic
Based on the valuable feedback I received, I decided to implement the patch
in the RTL pipeline. Since a similar optimization already exists in
simplify_binary_operation_1, I chose to generalize my original approach
and place it directly below that code.

The expression (X xor C1) + C2 is simplified to X xor (C1 xor C2) under
the conditions described in the patch. This is a more general optimization,
but it still applies to the RISC-V case, which was my initial goal:

long f1(long x, long y) {
return (x > y) ? 2 : 3;
}


Before the patch, the generated assembly is:

f1(long, long):
sgt a0,a0,a1
xoria0,a0,1
addia0,a0,2
ret

After the patch, the generated assembly is:

f1(long, long):
sgt a0,a0,a1
xoria0,a0,3
ret


The patch optimizes cases like x LT/GT y ? 2 : 3 (and x GE/LE y ? 3 : 2),
as initially intended. Since this optimization is more general, I noticed
it also optimizes cases like x < CONST ? 3 : 2 when CONST < 0. I’ve added
tests for these cases as well.

A bit of logic behind the patch: The equality A + B == A ^ B + 2 * (A & B)
always holds true. This can be simplified to A ^ B if 2 * (A & B) == 0.
In our case, we have A == X ^ C1, B == C2 and X is either 0 or 1.

2024-09-27  Jovan Vukic  

PR target/108038

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New
simplification.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/slt-1.c: New test.
CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/simplify-rtx.cc| 12 ++
 gcc/testsuite/gcc.target/riscv/slt-1.c | 59 ++
 2 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/slt-1.c

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index a20a61c5ddd..e8e60404ef6 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2994,6 +2994,18 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
simplify_gen_binary (XOR, mode, op1,
 XEXP (op0, 1)));
 
+  /* (plus (xor X C1) C2) is (xor X (C1^C2)) if X is either 0 or 1 and
+2 * ((X ^ C1) & C2) == 0; based on A + B == A ^ B + 2 * (A & B). */
+  if (CONST_SCALAR_INT_P (op1)
+ && GET_CODE (op0) == XOR
+ && CONST_SCALAR_INT_P (XEXP (op0, 1))
+ && nonzero_bits (XEXP (op0, 0), mode) == 1
+ && 2 * (INTVAL (XEXP (op0, 1)) & INTVAL (op1)) == 0
+ && 2 * ((1 ^ INTVAL (XEXP (op0, 1))) & INTVAL (op1)) == 0)
+   return simplify_gen_binary (XOR, mode, XEXP (op0, 0),
+   simplify_gen_binary (XOR, mode, op1,
+XEXP (op0, 1)));
+
   /* Canonicalize (plus (mult (neg B) C) A) to (minus A (mult B C)).  */
   if (!HONOR_SIGN_DEPENDENT_ROUNDING (mode)
  && GET_CODE (op0) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/slt-1.c 
b/gcc/testsuite/gcc.target/riscv/slt-1.c
new file mode 100644
index 000..29a64066081
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/slt-1.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+#include 
+
+#define COMPARISON(TYPE, OP, OPN, RESULT_TRUE, RESULT_FALSE) \
+TYPE test_##OPN(TYPE x, TYPE y) { \
+return (x OP y) ? RESULT_TRUE : RESULT_FALSE; \
+}
+
+/* Signed comparisons */
+COMPARISON(int64_t, >, GT1, 2, 3)
+COMPARISON(int64_t, >, GT2, 5, 6)
+
+COMPARISON(int64_t, <, LT1, 2, 3)
+COMPARISON(int64_t, <, LT2, 5, 6)
+
+COMPARISON(int64_t, >=, GE1, 3, 2)
+COMPARISON(int64_t, >=, GE2, 6, 5)
+
+COMPARISON(int64_t, <=, LE1, 3, 2)
+COMPARISON(int64_t, <=, LE2, 6, 5)
+
+/* Unsigned comparisons */
+COMPARISON(uint64_t, >, GTU1, 2, 3)
+COMPARISON(uint64_t, >, GTU2, 5, 6)
+
+COMPARISON(uint64_t, <, LTU1, 2, 3)
+COMPARISON(uint64_t, <, LTU2, 5, 6)
+
+COMPARISON(uint64_t, >=, GEU1, 3, 2)
+COMPARISON(uint64_t, >=, GEU2, 6, 5)
+
+COMPARISON(uint64_t, <=, LEU1, 3, 2)
+COMPARISON(uint64_t, <=, LEU2, 6, 5)
+
+#define COMPARISON_IMM(TYPE, OP, OPN, RESULT_TRUE, RESULT_FALSE) \
+TYPE testIMM_##OPN(TYPE x) { \
+return (x OP -3) ? RESULT_TRUE : RESULT_FALSE; \
+}
+
+/* Signed comparisons with immediate */
+COMPARISON

[PATCH v2] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

2024-11-26 Thread Jovan Vukic
Thank you for the feedback on the v1 patch.

As requested, I added detailed tests for various signed and unsigned integer
types in the test file bitops-11.c. I also included more complex expressions to
observe how everything behaves at the GIMPLE level and added vector test
examples as well. Since vector expressions are matched on (minus @0 1) instead
of (plus @0 -1), I added a simplification for the minus case in match.pd.

Additionally, I introduced simplifications for the expressions (a - 1) & -a,
(a - 1) | -a, and (a - 1) ^ -a to 0, -1, and -1, respectively, in 
simplify-rtx.cc.

For each of the three expressions, I added two if statements. The first matches
the typical (BIT_OP (plus (A -1)) (neg A)), while the second recognizes the
presence of a SUBREG within the RTL expression. For example, when A is of type
short, the second if statement is triggered. I didn't observe any issues with
match.pd missing any simplifications, but if that happens, the code in
simplify-rtx.cc should help.

Bootstrapped and tested on x86-linux-gnu with no regressions.

2024-11-26  Jovan Vukic  

gcc/ChangeLog:

* match.pd: New pattern.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New
code to handle (a - 1) & -a, (a - 1) | -a and (a - 1) ^ -a.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-11.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/match.pd  |  16 +++
 gcc/simplify-rtx.cc   |  87 
 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 116 ++
 3 files changed, 219 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0ac5674f24b..c85d4b9ae6c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1472,6 +1472,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_and:c @0 (bit_not (bit_xor:c @0 @1)))
  (bit_and @0 @1))
 
+/* Transform:
+   (a - 1) & -a ->  0.
+   (a - 1) | -a -> -1.
+   (a - 1) ^ -a -> -1.  */
+(for bit_op (bit_ior bit_xor bit_and)
+ (simplify
+  (bit_op:c (plus @0 integer_minus_onep@1) (negate @0))
+  (if (bit_op == BIT_AND_EXPR)
+{ build_zero_cst (type); }
+{ build_minus_one_cst (type); }))
+ (simplify
+  (bit_op:c (minus @0 integer_onep@1) (negate @0))
+  (if (bit_op == BIT_AND_EXPR)
+{ build_zero_cst (type); }
+{ build_minus_one_cst (type); })))
+
 /* a & (a == b)  -->  a & b (boolean version of the above). */
 (simplify
  (bit_and:c @0 (nop_convert? (eq:c @0 @1)))
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 893c5f6e1ae..514d21c6ef5 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3530,6 +3530,35 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 	  && GET_MODE_CLASS (mode) != MODE_CC)
 	return CONSTM1_RTX (mode);
 
+  /* (ior (plus (A -1)) (neg A)) is -1.  */
+  if (((GET_CODE (op1) == NEG
+	&& GET_CODE (op0) == PLUS
+	&& XEXP (op0, 1) == constm1_rtx)
+	   || (GET_CODE (op0) == NEG
+	   && GET_CODE (op1) == PLUS
+	   && XEXP (op1, 1) == constm1_rtx))
+	  && rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
+	  && !side_effects_p (XEXP (op0, 0))
+	  && !side_effects_p (XEXP (op1, 0)))
+	return CONSTM1_RTX (mode);
+
+  /* (ior (subreg (plus (A -1))) (subreg (neg A))) is -1.  */
+  if (GET_CODE (op0) == SUBREG
+	   && GET_CODE (op1) == SUBREG
+	   && subreg_lowpart_p (op0)
+	   && subreg_lowpart_p (op1)
+	   && ((GET_CODE (XEXP (op1, 0)) == NEG
+		&& GET_CODE (XEXP (op0, 0)) == PLUS
+		&& XEXP (XEXP (op0, 0), 1) == constm1_rtx)
+	   || (GET_CODE (XEXP (op0, 0)) == NEG
+		   && GET_CODE (XEXP (op1, 0)) == PLUS
+		   && XEXP (XEXP (op1, 0), 1) == constm1_rtx))
+	  && rtx_equal_p (XEXP (XEXP (op0, 0), 0),
+			  XEXP (XEXP (op1, 0), 0))
+	  && !side_effects_p (XEXP (XEXP (op0, 0), 0))
+	  && !side_effects_p (XEXP (XEXP (op1, 0), 0)))
+	return CONSTM1_RTX (mode);
+
   /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
   if (CONST_INT_P (op1)
 	  && HWI_COMPUTABLE_MODE_P (mode)
@@ -3691,6 +3720,35 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 	  & nonzero_bits (op1, mode)) == 0)
 	return (simplify_gen_binary (IOR, mode, op0, op1));
 
+  /* (xor (plus (A -1)) (neg A)) is -1.  */
+  if (((GET_CODE (op1) == NEG
+	&& GET_CO

[PATCH v3] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

2024-12-03 Thread Jovan Vukic
Thank you for the feedback. I have made the minor changes that were requested.
Additionally, I extracted the repetitive code into a reusable helper function,
match_plus_neg_pattern, making the code much more readable. Furthermore, the
logic, code, and tests remain the same as in version 2 of the patch.

2024-12-03  Jovan Vukic  

gcc/ChangeLog:

* match.pd: New pattern.
* simplify-rtx.cc (match_plus_neg_pattern): New helper function.
(simplify_context::simplify_binary_operation_1): New
code to handle (a - 1) & -a, (a - 1) | -a and (a - 1) ^ -a.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-11.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/match.pd  |  16 +++
 gcc/simplify-rtx.cc   |  41 
 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 117 ++
 3 files changed, 174 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 2dd67b69cf1..35cc782f137 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1472,6 +1472,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_and:c @0 (bit_not (bit_xor:c @0 @1)))
  (bit_and @0 @1))
 
+/* Transform:
+   (a - 1) & -a ->  0.
+   (a - 1) | -a -> -1.
+   (a - 1) ^ -a -> -1.  */
+(for bit_op (bit_ior bit_xor bit_and)
+ (simplify
+  (bit_op:c (plus @0 integer_minus_onep) (negate @0))
+  (if (bit_op == BIT_AND_EXPR)
+{ build_zero_cst (type); }
+{ build_minus_one_cst (type); }))
+ (simplify
+  (bit_op:c (minus @0 integer_onep) (negate @0))
+  (if (bit_op == BIT_AND_EXPR)
+{ build_zero_cst (type); }
+{ build_minus_one_cst (type); })))
+
 /* a & (a == b)  -->  a & b (boolean version of the above). */
 (simplify
  (bit_and:c @0 (nop_convert? (eq:c @0 @1)))
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 86b3f331928..223a0095978 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2941,6 +2941,35 @@ simplify_rotate_op (rtx op0, rtx op1, machine_mode mode)
   return NULL_RTX;
 }
 
+/* Returns true if OP0 and OP1 match the pattern (OP (plus (A - 1)) (neg A)),
+   and the pattern can be simplified (there are no side effects).  */
+
+static bool
+match_plus_neg_pattern (rtx op0, rtx op1, machine_mode mode)
+{
+  /* Remove SUBREG from OP0 and OP1, if needed.  */
+  if (GET_CODE (op0) == SUBREG
+  && GET_CODE (op1) == SUBREG
+  && subreg_lowpart_p (op0)
+  && subreg_lowpart_p (op1))
+{
+  op0 = XEXP (op0, 0);
+  op1 = XEXP (op1, 0);
+}
+
+  /* Check for the pattern (OP (plus (A - 1)) (neg A)).  */
+  if (((GET_CODE (op1) == NEG
+	&& GET_CODE (op0) == PLUS
+	&& XEXP (op0, 1) == CONSTM1_RTX (mode))
+   || (GET_CODE (op0) == NEG
+	   && GET_CODE (op1) == PLUS
+	   && XEXP (op1, 1) == CONSTM1_RTX (mode)))
+  && rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
+  && !side_effects_p (XEXP (op0, 0)))
+return true;
+  return false;
+}
+
 /* Subroutine of simplify_binary_operation.  Simplify a binary operation
CODE with result mode MODE, operating on OP0 and OP1.  If OP0 and/or
OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the
@@ -3553,6 +3582,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 	  && GET_MODE_CLASS (mode) != MODE_CC)
 	return CONSTM1_RTX (mode);
 
+  /* Convert (ior (plus (A - 1)) (neg A)) to -1.  */
+  if (match_plus_neg_pattern (op0, op1, mode))
+	return CONSTM1_RTX (mode);
+
   /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
   if (CONST_INT_P (op1)
 	  && HWI_COMPUTABLE_MODE_P (mode)
@@ -3714,6 +3747,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 	  & nonzero_bits (op1, mode)) == 0)
 	return (simplify_gen_binary (IOR, mode, op0, op1));
 
+  /* Convert (xor (plus (A - 1)) (neg A)) to -1.  */
+  if (match_plus_neg_pattern (op0, op1, mode))
+	return CONSTM1_RTX (mode);
+
   /* Convert (XOR (NOT x) (NOT y)) to (XOR x y).
 	 Also convert (XOR (NOT x) y) to (NOT (XOR x y)), similarly for
 	 (NOT y).  */
@@ -3981,6 +4018,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 	  && GET_MODE_CLASS (mode) != MODE_CC)
 	return CONST0_RTX (mode);
 
+  /* Convert (and (plus (A - 1)) (neg A)) to 0.  */
+  if (match_plus_neg_pattern (op0, op1, mode))
+	return CONST0_RTX (mode);
+
   /* Transform (and (extend X) C) into (zero_extend