https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58727
--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So part of the problem here is the ARM and x86 ports will accept the
"simplified" constant in their AND patterns. The ARM port will eventually
split it into components, but by then it's too late to clean things up.
If we put small amount of code into simplify-rtx like this:
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 8f0f16c865d1..46f1df6fab46 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3699,6 +3699,19 @@ simplify_context::simplify_binary_operation_1 (rtx_code
code,
/* If (C1|C2) == ~0 then (X&C1)|C2 becomes X|C2. */
if (((c1|c2) & mask) == mask)
return simplify_gen_binary (IOR, mode, XEXP (op0, 0), op1);
+
+ /* If (C1|C2) has a single bit clear, then adjust C1 so that
+ when split it'll match a single bit clear style insn.
+
+ This could have been done with a target dependent splitter, but
+ then every target with single bit manipulation insns would need
+ to implement such splitters. */
+ if (exact_log2 (~(c1 | c2)) >= 0)
+ {
+ rtx temp = gen_rtx_AND (mode, XEXP (op0, 0), GEN_INT (c1 | c2));
+ temp = gen_rtx_IOR (mode, temp, trueop1);
+ return temp;
+ }
}
/* Convert (A & B) | A to A. */
That at least gives ports with tight operand predicates and single bit
manipulation instructions a fighting chance to optimize this case.
Essentially by rewriting into that form we get this on rv64gcb:
Trying 6, 7 -> 9:
6: r139:SI=r141:SI&0xfffffffffffffffd
REG_DEAD r141:SI
7: r140:SI=r139:SI&0xffffffffffbfffff
REG_DEAD r139:SI
9: r137:SI=r140:SI|0x2
REG_DEAD r140:SI
Failed to match this instruction:
(set (reg:SI 137 [ _3 ])
(ior:SI (and:SI (reg:SI 141 [ a ])
(const_int -4194305 [0xffffffffffbfffff]))
(const_int 2 [0x2])))
Successfully matched this instruction:
(set (reg:SI 140)
(and:SI (reg:SI 141 [ a ])
(const_int -4194305 [0xffffffffffbfffff])))
Successfully matched this instruction:
(set (reg:SI 137 [ _3 ])
(ior:SI (reg:SI 140)
(const_int 2 [0x2])))
Where the first instruction matches a bclr and the second would match either an
ori or bset. Point being there's both a target dependent and a target
independent component to this issue.
Shreya is tackling the simplify-rtx change, but not the target dependent bits
that would be necessary to fix this for arm, x86 and other targets that allow
many more constants in their logical patterns.