On Fri, Mar 13, 2020 at 03:21:18AM +0000, Yangfei (Felix) wrote: > > On Wed, Mar 04, 2020 at 08:39:36AM +0000, Yangfei (Felix) wrote: > > > This is a simple fix for PR94026. > > > With this fix, combine will try make an extraction if we are in a > > > equality > > comparison and this is an AND > > > with a constant which is power of two minus one. Shift here should be > > > an > > constant. For example, combine > > > will transform (compare (and (lshiftrt x 8) 6) 0) to (compare > > > (zero_extract > > (x 2 9)) 0). > > > > Why is that a good thing? > > The reported test case is reduced from spec2017 541.leela_r. I have pasted > original code snippet on the bugzilla. > We found other compilers like aocc/llvm can catch this pattern and simplify > it.
That wasn't my question, let me rephrase: why would writing it as zero_extract (instead of as a more canonical form) be wanted? The aarch backend only has zero_extract formulations for most of the bitfield instructions. If you fix that problem, all of this should go away? Like, the testcase in the PR starts with Trying 7 -> 8: 7: r99:SI=r103:SI>>r104:SI#0 REG_DEAD r104:SI REG_DEAD r103:SI 8: r100:SI=r99:SI&0x6 REG_DEAD r99:SI Failed to match this instruction: (set (reg:SI 100) (and:SI (ashiftrt:SI (reg:SI 103) (subreg:QI (reg:SI 104) 0)) (const_int 6 [0x6]))) and that should match already (that's an ubfm (ubfx))? > > (There should be thorough tests on many archs, showing it helps on average, > > and it doesn't regress anything. I can do that for you, but not right now). > > I only have aarch64 & x86_64 linux available and have tested this patch with > spec17 on both platforms. > No obvious improvement & regression witnessed. This is expected as only one > instruction is reduced here. What should be tested is what new combinations are done, and which are *no longer* done. > > In general, we should have *fewer* zero_extract, not more. Some reasons for that: 1) All those can be expressed with simpler operations as well; 2) Most very similar expressions cannot be expressed as zero_extract, although many architectures can handle (some of) those just fine; 3) The optimizers do not handle zero_extract very well at all (this includes simplify-rtx, to start with). sign_extract is nastier -- we really want to have a sign_extend that works on separate bits, not as coarse as address units as we have now -- but it currently isn't handled much either. Segher