On 09/03/2015 08:59 AM, Wilco Dijkstra wrote:
Segher Boessenkool wrote:
On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
Combine canonicalizes certain AND masks in a comparison with zero into extracts
of the
widest
register type. During matching these are expanded into a very inefficient
sequence that
fails to
match. For example (x & 2) == 0 is matched in combine like this:
Failed to match this instruction:
(set (reg:CC 66 cc)
(compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
(const_int 1 [0x1])
(const_int 1 [0x1]))
(const_int 0 [0])))
Yes. Some processors even have specific instructions to do this.
However there are 2 issues with this, one is the spurious subreg,
Combine didn't make that up out of thin air; something already used
DImode here. It could simplify it to SImode in this case, that is
true, don't know why it doesn't; it isn't necessarily faster code to
do so, it can be slower, it might not match, etc.
The relevant RTL instructions on AArch64 are:
(insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
(and:SI (reg/v:SI 76 [ xD.2641 ])
(const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
(nil))
(insn 26 25 27 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 77 [ D.2705 ])
(const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
(expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
(nil)))
I don't see anything using DI...
Is this aarch64? Might be the case that combine wants to do everything
in word_mode for some reason or another.
Yes, so it looks like some other backends match the odd pattern and then have
another
pattern change it back into the canonical AND/TST form during the split phase
(maybe
the subreg confuses register allocation or block other optimizations). This all
seems
a lot of unnecessary complexity for a few special immediates when there is a
much
simpler solution...
subregs get in the way in various places. So if we can avoid generating
them, then that's good.
But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the
sign bit, and for mask tests, you shift out all the non-mask bits.
Most of those are quite target-specific. Some others are already done,
and/or done by other passes.
I wouldn't go that far. Many targets have simple, cheap ways to do
single bit testing. And there are some targets where trying to shift a
bit into the carry flag would be *horribly* bad performance-wise as they
only have single bit shifters.
But what combine does here is even more target-specific. Shifts and AND setting
flags
are universally supported on targets with condition code register.
Bitfield test/extract instructions are more rare, and when they are supported,
they
may well be more expensive.
I don't think it's anywhere near that clear cut.
So my question is, is it combine's job to try all possible permutations that
constitute a bit or mask test?
Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that. It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).
But the change from AND to zero_extract is already changing semantics...
It shouldn't be changing semantics -- changing form != changing semantics.
Combine does not try multiple options.
I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
If that is feasible then adding a 3rd option should be possible.
Combine will stop once it finds a match. It may, in some circumstances,
try more than one representation to find a match, but that's the
exception rather than the rule.
You will end up with a *lot* of target hooks like this. It will also
make testing harder (less coverage). I am not sure that is a good idea.
We certainly need a lot more target hooks in general so GCC can do the right
thing
(rather than using costs inconsistently all over the place). But that's a
different
discussion...
Let's be very careful here, target hooks aren't always the solution.
I'd rather see the costing models fixed and use those across the board.
But frankly, I don't know how to fix the costing models.
jeff