On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote: > > > Combine canonicalizes certain AND masks in a comparison with zero into > > > extracts of the > > widest > > > register type. During matching these are expanded into a very inefficient > > > sequence that > > fails to > > > match. For example (x & 2) == 0 is matched in combine like this: > > > > > > Failed to match this instruction: > > > (set (reg:CC 66 cc) > > > (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0) > > > (const_int 1 [0x1]) > > > (const_int 1 [0x1])) > > > (const_int 0 [0]))) > > > > Yes. Some processors even have specific instructions to do this. > > However there are 2 issues with this, one is the spurious subreg,
Combine didn't make that up out of thin air; something already used DImode here. It could simplify it to SImode in this case, that is true, don't know why it doesn't; it isn't necessarily faster code to do so, it can be slower, it might not match, etc. > the other is > that only a subset of legal zero_extracts are tried (only single bit and the > degenerate case of zero_extract with shift of 0 - which I think should not be > a > zero_extract). All other AND immediate remain as AND. Yes. I'm happy to see this weird special case "optimisation", "canocalisation" gone. > So to emit an AND on targets without such specific instructions, or where > such > instructions are more expensive than a simple AND (*), you need now at least > 3 different > backend patterns for any instruction that can emit AND immediate... It's only a problem for AND-and-compare, no? > (*) I think that is another issue in combine - when both alternatives match > you > want to select the lowest cost one, not the first one that matches. That's recog, not combine. And quite a few backends rely on "first match wins", because it always has been that way. It also is very easy to write such patterns accidentally (sometimes even the exact same one twice in the same machine description, etc.) > > > Failed to match this instruction: > > > (set (reg:CC 66 cc) > > > (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) > > > 0) > > > (const_int 1 [0x1])) > > > (const_int 1 [0x1])) > > > (const_int 0 [0]))) > > > > This is after r223067. Combine tests only one "final" instruction; that > > revision rewrites zero_ext* if it doesn't match and tries again. This > > helps for processors that can do more generic masks (like rs6000, and I > > believe also aarch64?): without it, you need many more patterns to match > > all the zero_ext{ract,end} cases. > > But there are more efficient ways to emit single bit and masks tests that > apply > to most CPUs rather than doing something specific that works for just one > target > only. For example single bit test is a simple shift into carry flag or into > the > sign bit, and for mask tests, you shift out all the non-mask bits. Most of those are quite target-specific. Some others are already done, and/or done by other passes. > So my question is, is it combine's job to try all possible permutations that > constitute a bit or mask test? Combine converts the merged instructions to what it thinks is the canonical or cheapest form, and uses that. It does not try multiple options (the zero_ext* -> and+shift rewriting is not changing the semantics of the pattern at all). > Or would it be better to let each target decide > on how to canonicalize bit tests and only try that alternative? The question is how to write the pattern to be most convenient for all targets. > > > Neither matches the AArch64 patterns for ANDS/TST (which is just compare > > > and AND). If the > > immediate > > > is not a power of 2 or a power of 2 -1 then it matches correctly as > > > expected. > > > > > > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion > > > > It is zero_extract(x,1,1) really. This is convenient for (old and embedded) > > processors that have special bit-test instructions. If we now want combine > > to not do this, we'll have to update all backends that rely on it. > > Would any backend actually rely on this given it only does some specific > masks, > has a redundant shift with 0 for the mask case and the odd subreg as well? Such backends match the zero_extract patterns, of course. Random example: the h8300 patterns for the "btst" instruction. > > They are common, and many processors had instructions for them, which is > > (supposedly) why combine special-cased them. > > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well... Combine does not try multiple options. > > > It's trivial to change change_zero_ext to expand extracts always into AND > > > and remove the > > redundant > > > subreg. > > > > Not really trivial... Think about comparisons... > > > > int32_t x; > > ((x >> 31) & 1) > 0; > > // is not the same as > > (x & 0x80000000) > 0; // signed comparison > > > > (You do not easily know what the COMPARE is used for). > > Indeed if you had a zero_extract that included the sign-bit then you may have > to adjust > the compare condition. However it looks like that case can't happen - (x & > 0x80000000) > comparisons with have the AND optimized away much earlier. A long time before combine, yes. But it is *possible* to give such insns to combine, and it should not do the wrong thing with them. > > > However wouldn't it make more sense to never special case certain AND > > > immediate in the first > > > place? > > > > Yes, but we need to make sure no targets regress (i.e. by rewriting patterns > > for those that do). We need to know the impact of such a change, at the > > least. > > The alternative would be let the target decide how to canonicalize bit tests. > That seems like a better solution than trying multiple possibilities which > can never > match on most targets. You will end up with a *lot* of target hooks like this. It will also make testing harder (less coverage). I am not sure that is a good idea. Segher