On Mon, May 23, 2022 at 12:49 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > Hopefully, if I explain even more of the context, you'll better understand why > this harmless (and at worse seemingly redundant) peephole2 is actually > critical > for addressing significant regressions in the compiler without introducing new > testsuite failures. I wouldn't ask (again), if I didn't feel it's important. > > Basically, I'm trying to unblock Hongtao's patch (for PR target/104610) > which in your own review, explained is better handled by/during STV: > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594070.html > > Unfortunately, that patch of mine to STV (that I want to ping next) that > solves > the P2 code quality regression PR target/70321, is itself blocked by another > review of yours: > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593200.html > where this fix (alone) leads to a regression of the test case pr65105-5.c.
Is it possible to start with a STV patch? If there are only a few introduced regressions, we can afford them in this stage of development, and fix regressions later with a follow-up patches. THis way, it is much easier for me to see the effect of the patch, and its benefit can be weighted appropriately. I was indeed under the impression that we try to peephole a combination that appears once in a blue moon, but if the situation appears regularly, this is a completely different matter. > This pending regression has nothing to do with TARGET_BMI's andn, but > the idiom "if ((x & y) != y)" on ia32, where x and y are DImode, and > stv/reload > has decided to place these values in SSE registers. > > After combine we have an *anddi3_doubleword and *cmpdi3_doubleword: > (insn 22 21 23 4 (parallel [ > (set (reg:DI 97) > (and:DI (reg/v:DI 92 [ p2 ]) > (reg:DI 88 [ _25 ]))) > (clobber (reg:CC 17 flags)) > ]) "pr65105-5.c":20:18 530 {*anddi3_doubleword} > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))) > (insn 23 22 24 4 (set (reg:CCZ 17 flags) > (compare:CCZ (reg/v:DI 92 [ p2 ]) > (reg:DI 97))) "pr65105-5.c":20:8 29 {*cmpdi_doubleword} > (expr_list:REG_DEAD (reg:DI 97) > (nil))) One possible approach is to introduce intermediate compound (but non-existent) instruction that is created by combine pass, and is later split to real instructions. But a real testcase is needed, so the correct strategy is used. > After STV we have: > (insn 22 21 45 4 (set (subreg:V2DI (reg:DI 97) 0) > (and:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) > (subreg:V2DI (reg:DI 88 [ _25 ]) 0))) "pr65105-5.c":20:18 6640 > {*andv2di3} > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))) > (insn 45 22 46 4 (set (reg:V2DI 103) > (xor:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) > (subreg:V2DI (reg:DI 97) 0))) "pr65105-5.c":20:8 -1 > (nil)) > (insn 46 45 23 4 (set (reg:V2DI 103) > (vec_select:V2DI (vec_concat:V4DI (reg:V2DI 103) > (reg:V2DI 103)) > (parallel [ > (const_int 0 [0]) > (const_int 2 [0x2]) > ]))) "pr65105-5.c":20:8 -1 > (nil)) > (insn 23 46 24 4 (set (reg:CC 17 flags) > (unspec:CC [ > (reg:V2DI 103) repeated x2 > ] UNSPEC_PTEST)) "pr65105-5.c":20:8 7425 {sse4_1_ptestv2di} > (expr_list:REG_DEAD (reg:DI 97) > (nil))) > > where the XOR has been introduce to implement the equality, > as P == Q is effectively implemented as (P ^ Q) == 0. At this point, > the only remaining pass that can optimize the pand followed by > the pxor is peephole2. > > The requirement to optimize this is from gcc.target/i386/pr65105-5.c > where the desired implementation is explicitly looking for pandn+ptest: > > /* { dg-do compile { target ia32 } } */ > /* { dg-options "-O2 -march=core-avx2 -mno-stackrealign" } */ > /* { dg-final { scan-assembler "pandn" } } */ > /* { dg-final { scan-assembler "pxor" } } */ > /* { dg-final { scan-assembler "ptest" } } */ > > > Confusingly, I've even more patches in the queue/backlog for this part > of the compiler (it's an air traffic control problem, fallout from stage 4). > > And of course, very many thanks for the various andn related patches > that have already been approved/committed to the backend, to avoid > potential regressions related to code size (-Os and -Oz). It's a long road > with many steps. > > Might you reconsider? Pretty please? No problem for me, but the testcase would really help. Uros.