On 11/30/25 6:53 PM, Oleg Endo wrote:
Hi Jeff,
On Sun, 2025-11-30 at 15:33 -0700, Jeff Law wrote:
We don't bother with the transformation when the XOR is flipping a bit
known to be zero (ie, a high bit of the result of the right shift or a
low bit on the result of the left shift). For those cases we already
figure out that the XOR is just an IOR and the right things already
"just happen".
This triggered some code generation changes on the SH (not surprising
because this BZ was derived from an older SH BZ). It doesn't seem to
significantly improve the SH code, though it does turn a cmp/pz + rotate
through carry with a rotate + xor with immediate. That may be a
latency win on the SH, I really don't know.
Thanks for keeping SH in mind.
On SH the replacement "rotate + xor with immediate" is not a win, because:
- both insns are of EX type on classic SH4 (no parallel execution)
- xor with immediate puts additional register pressure on R0
Since cmp/pz is of MT type on classic SH4 it is more likely to be executed
in parallel with another insn.
No idea really how to express these kind of machine / ISA specifics for the
GIMPLE layers. The only way I can think of is to keep updating the insns
patterns and hacks in the .md file.
The change of the insn sequence on SH is a regression. PR 59533 is
explicitly about "missed cmp/pz opportunity".
Please drop the changes to gcc/testsuite/gcc.target/sh/pr59533-1.c, so that
it's clear that something's not the way it was supposed to be.
Here's an update that adds a pattern to the SH port to recognize the
slightly different RTL presented for this scenario. Essentially it just
needs to recognize (xor (rotate)) for a count of 1 and flipping the low bit.
OK for the trunk?
Jeff
PR target/121778
gcc/
* match.pd: Add pattern to recognize rotate with one or more
bits flipped via xor.
* config/sh/sh.md (*rotcl): Another version for a rotate by one
position with a flip of the LSB.
gcc/testsuite/
* gcc.target/riscv/pr121778.c: New test.
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 65b353da56e1..32d3d1a8fb65 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -3271,6 +3271,25 @@ (define_insn_and_split "*rotcl"
operands[3] = get_t_reg_rtx ();
})
+(define_insn_and_split "*rotcl"
+ [(set (match_operand:SI 0 "arith_reg_dest")
+ (xor:SI (rotate:SI (match_operand:SI 1 "arith_reg_operand")
+ (const_int 1))
+ (const_int 1)))
+ (clobber (reg:SI T_REG))]
+ "TARGET_SH1 && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(parallel [(set (match_dup 0)
+ (ior:SI (ashift:SI (match_dup 1) (const_int 1))
+ (and:SI (match_dup 3) (const_int 1))))
+ (clobber (reg:SI T_REG))])]
+{
+ operands[3] = gen_rtx_GE (SImode, operands[1], const0_rtx);
+ sh_split_treg_set_expr (operands[3], curr_insn);
+ operands[3] = get_t_reg_rtx ();
+})
+
(define_insn_and_split "*rotcl"
[(set (match_operand:SI 0 "arith_reg_dest")
(ior:SI (and:SI (match_operand:SI 1 "arith_reg_or_t_reg_operand")
diff --git a/gcc/match.pd b/gcc/match.pd
index f164ec591008..11cd3ec8f9f0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -12062,3 +12062,27 @@ and,
(simplify
(IFN_VEC_SHL_INSERT (vec_duplicate@1 @0) @0)
@1)
+
+/* In this case the XOR flips bits that originate from the result of the
+ right shift and do not impact the result of the left shift. We can
+ reassociate the XOR to work on the final result and simplify the rest
+ to a rotate. */
+(simplify
+ (bit_ior:c (lshift @0 INTEGER_CST@1)
+ (bit_xor (rshift @2 INTEGER_CST@3) INTEGER_CST@4))
+ (if (((~((HOST_WIDE_INT_1U << tree_to_uhwi (@1)) - 1)) & tree_to_uhwi (@4))
== 0
+ && (tree_to_uhwi (@1) + tree_to_uhwi (@3)) == TYPE_PRECISION (type)
+ && TYPE_UNSIGNED (type)
+ && @0 == @2)
+ (bit_xor (rrotate @0 @3) @4)))
+
+/* Similarly, but in this case the XOR flips bits that originate from the
+ result of the left shift. */
+(simplify
+ (bit_ior:c (bit_xor (lshift @0 INTEGER_CST@1) INTEGER_CST@2)
+ (rshift @3 INTEGER_CST@4))
+ (if ((((((HOST_WIDE_INT_1U << tree_to_uhwi (@1)) - 1)) & tree_to_uhwi (@2))
== 0)
+ && (tree_to_uhwi (@1) + tree_to_uhwi (@4)) == TYPE_PRECISION (type)
+ && TYPE_UNSIGNED (type)
+ && @0 == @3)
+ (bit_xor (rrotate @0 @4) @2)))
diff --git a/gcc/testsuite/gcc.target/riscv/pr121778.c
b/gcc/testsuite/gcc.target/riscv/pr121778.c
new file mode 100644
index 000000000000..87da9c3cd962
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr121778.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gcb -mabi=lp64d" { target rv64} } */
+/* { dg-options "-O2 -march=rv32gcb -mabi=ilp32" { target rv32} } */
+
+/* We need to adjust the constant so this works for rv32 and rv64. */
+#if __riscv_xlen == 32
+#define ONE 1U
+#define TYPE unsigned int
+#else
+#define ONE 1UL
+#define TYPE unsigned long
+#endif
+
+#define F1(C) TYPE test_01##C (TYPE a) { return (a << (__riscv_xlen - C)) |
((a >> C) ^ 1); }
+#define F2(C) TYPE test_02##C (TYPE a) { return ((a >> (__riscv_xlen - C)) ^
1) | (a << C); }
+#define F3(C) TYPE test_03##C (TYPE a) { return ((a << (__riscv_xlen - C)) ^
(ONE << (__riscv_xlen - 1))) | (a >> C); }
+#define F4(C) TYPE test_04##C (TYPE a) { return (a >> (__riscv_xlen - C)) |
((a << C) ^ (ONE << (__riscv_xlen - 1))); }
+
+#define F(C) F1(C) F2(C) F3(C) F4(C)
+
+
+F (1)
+F (2)
+F (3)
+F (4)
+F (5)
+F (6)
+F (7)
+F (8)
+F (9)
+F (10)
+F (11)
+F (12)
+F (13)
+F (14)
+F (15)
+F (16)
+F (17)
+F (18)
+F (19)
+F (20)
+F (21)
+F (22)
+F (23)
+F (24)
+F (25)
+F (26)
+F (27)
+F (28)
+F (29)
+F (30)
+F (31)
+#if __riscv_xlen == 64
+F (32)
+F (33)
+F (34)
+F (35)
+F (36)
+F (37)
+F (38)
+F (39)
+F (40)
+F (41)
+F (42)
+F (43)
+F (44)
+F (45)
+F (46)
+F (47)
+F (48)
+F (49)
+F (50)
+F (51)
+F (52)
+F (53)
+F (54)
+F (55)
+F (56)
+F (57)
+F (58)
+F (59)
+F (60)
+F (61)
+F (62)
+F (63)
+
+/* { dg-final { scan-assembler-times "\trori" 252 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "\txori" 126 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "\tbinv" 126 { target { rv64 } } } } */
+
+/* { dg-final { scan-assembler-times "\trori" 124 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "\txori" 62 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "\tbinv" 62 { target { rv32 } } } } */
+#endif