So I've gone round and round with BZ39726.
The basic idea here is we have an operation on widened operands that
feeds a narrow comparison. ie, the two inputs are QImode, extended to
SImode, subtracted, then we want to conditionally branch on the result
of sign bit of QImode.
If we could just do the subtraction in the native mode of the operands
(QImode) and test that result, we eliminate two extensions and the test
itself can be eliminated as we'll get the QImode sign bit from the
QImode subtraction.
I first started poking at combine to see if I could exploit the newer 4
insn combination capability to see enough INSNs to prove we could narrow
the operations *and* arrange for the resulting insns to be well suited
for redundant cmp/tst removal.
This was going to require some extensions to the simplification code,
the code to determine good split points, the when to use 4 insn
combination heuristics and possibly other stuff. I was able to see all
the insns I needed but the RTL I was going to have to detect and rewrite
was getting pretty fugly.
I changed course and started looking at twiddling the m68k backend. I
built a proof of concept pair of splitters that would rewrite the key 4
insns and it clearly improved the code. But the result was not well
suited for removing the redundant tst. To also arrange for result to be
suitable for redundant tst/cmp removal, the two proof of concept
splitters would have to merely be bridge patterns to an even larger
splitter which would give us access to the conditional branch so that we
could rewrite the compare and conditional branch too in such a way to
make it obvious the result of the subtraction was sufficient to remove
the tst/cmp instruction.
I then wondered if I could use the new match.pd code to describe what I
wanted at a higher level, possibly getting to simpler code while still
working on generic/gimple. Richi speculated that the new match.pd
approach could be used for type demotions -- I was skeptical, but what
the hell.
+/* If we are testing a single bit resulting from a binary
+ operation in precision P1 where the operands were widened
+ precision P2 and the tested bit is the sign bit for
+ precision P2. Rewrite so the binary operation is in
+ precision P2. */
+(for inner_op (minus plus mult bit_and bit_ior bit_xor)
+ (simplify
+ (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
+ (if (GENERIC
+ && exact_log2 (TREE_INT_CST_LOW (@3)) + 1 == TYPE_PRECISION
(TREE_TYPE (@0))
+ && TREE_TYPE (@0) == TREE_TYPE (@1)
+ && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+ /* This is a bit hackish, but I didn't see a good way to change
+ the type of the result. Note that changing the result like
+ this implies that we are not operating on gimple. */
+ (with {type = TREE_TYPE (@0); }
+ (convert (bit_and (inner_op @0 @1) (convert @3)))))))
The exact_log2 (blah) + 1 == TYPE_PRECISION (blah2) stuff ensures that
we're looking at just the sign bit in the same mode/precision as the two
operands of the binary operation.
We're changing the type of the entire expression to match the native
type of the operands. So we have to reassign "type" to the new type we
want to use. This also implies we're working on generic rather than
gimple. It's a bit of a hack, but I couldn't convince the code to use a
different type for the final result.
I guess it could be made to work on gimple as well by expanding it to
include the conditional branch this pattern feeds -- that way we could
avoid the type mismatch.
With that match to match.pd, I get the desired code on the m68k for the
test in the testcase as well as several of my own. I've also verified
this makes _no_ difference to the resulting runtime libraries for x86
and x86_64.
I'll spin a m68k bootstrap over the weekend, but wanted to get some
feedback on the match.pd pattern and perhaps a cleaner way to deal with
changing the resultant type.
Thoughts, comments?