Hi Uros, Hopefully, if I explain even more of the context, you'll better understand why this harmless (and at worse seemingly redundant) peephole2 is actually critical for addressing significant regressions in the compiler without introducing new testsuite failures. I wouldn't ask (again), if I didn't feel it's important.
Basically, I'm trying to unblock Hongtao's patch (for PR target/104610) which in your own review, explained is better handled by/during STV: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594070.html Unfortunately, that patch of mine to STV (that I want to ping next) that solves the P2 code quality regression PR target/70321, is itself blocked by another review of yours: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593200.html where this fix (alone) leads to a regression of the test case pr65105-5.c. This pending regression has nothing to do with TARGET_BMI's andn, but the idiom "if ((x & y) != y)" on ia32, where x and y are DImode, and stv/reload has decided to place these values in SSE registers. After combine we have an *anddi3_doubleword and *cmpdi3_doubleword: (insn 22 21 23 4 (parallel [ (set (reg:DI 97) (and:DI (reg/v:DI 92 [ p2 ]) (reg:DI 88 [ _25 ]))) (clobber (reg:CC 17 flags)) ]) "pr65105-5.c":20:18 530 {*anddi3_doubleword} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 23 22 24 4 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:DI 92 [ p2 ]) (reg:DI 97))) "pr65105-5.c":20:8 29 {*cmpdi_doubleword} (expr_list:REG_DEAD (reg:DI 97) (nil))) After STV we have: (insn 22 21 45 4 (set (subreg:V2DI (reg:DI 97) 0) (and:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) (subreg:V2DI (reg:DI 88 [ _25 ]) 0))) "pr65105-5.c":20:18 6640 {*andv2di3} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 45 22 46 4 (set (reg:V2DI 103) (xor:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) (subreg:V2DI (reg:DI 97) 0))) "pr65105-5.c":20:8 -1 (nil)) (insn 46 45 23 4 (set (reg:V2DI 103) (vec_select:V2DI (vec_concat:V4DI (reg:V2DI 103) (reg:V2DI 103)) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) ]))) "pr65105-5.c":20:8 -1 (nil)) (insn 23 46 24 4 (set (reg:CC 17 flags) (unspec:CC [ (reg:V2DI 103) repeated x2 ] UNSPEC_PTEST)) "pr65105-5.c":20:8 7425 {sse4_1_ptestv2di} (expr_list:REG_DEAD (reg:DI 97) (nil))) where the XOR has been introduce to implement the equality, as P == Q is effectively implemented as (P ^ Q) == 0. At this point, the only remaining pass that can optimize the pand followed by the pxor is peephole2. The requirement to optimize this is from gcc.target/i386/pr65105-5.c where the desired implementation is explicitly looking for pandn+ptest: /* { dg-do compile { target ia32 } } */ /* { dg-options "-O2 -march=core-avx2 -mno-stackrealign" } */ /* { dg-final { scan-assembler "pandn" } } */ /* { dg-final { scan-assembler "pxor" } } */ /* { dg-final { scan-assembler "ptest" } } */ Confusingly, I've even more patches in the queue/backlog for this part of the compiler (it's an air traffic control problem, fallout from stage 4). And of course, very many thanks for the various andn related patches that have already been approved/committed to the backend, to avoid potential regressions related to code size (-Os and -Oz). It's a long road with many steps. Might you reconsider? Pretty please? Roger -- > -----Original Message----- > From: Uros Bizjak <ubiz...@gmail.com> > Sent: 23 May 2022 10:11 > To: Roger Sayle <ro...@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PING] Peephole pand;pxor into pandn > > On Mon, May 23, 2022 at 10:59 AM Roger Sayle > <ro...@nextmovesoftware.com> wrote: > > > > > > Hi Uros, > > > > Thanks for the speedy review. The point of this patch is that (with > > pending changes to STV) the pand;pxor sequence isn't created until > > after combine, and hence doesn't/won't get caught by any of the > > current pre-reload/combine splitters. > > IMO this happens due to inconsistencies between integer and vector set, where > integer andn is absent without BMI. However, we don't re-run the combine after > reload, and I don't think it is worth to reimplement it via peephole2 > patterns. > Please note that AVX allows much more combinations that are not catched by > your patch, and considering that combine already does the transformation, I > don't see a compelling reason for this very specialized peephole2. > > Let's keep the patch shelved until a testcase shows the benefits of the patch. > > Uros. > > > > > > > > -----Original Message----- > > > From: Uros Bizjak <ubiz...@gmail.com> > > > Sent: 23 May 2022 09:51 > > > To: Roger Sayle <ro...@nextmovesoftware.com> > > > Cc: gcc-patches@gcc.gnu.org > > > Subject: Re: [x86 PING] Peephole pand;pxor into pandn > > > > > > On Mon, May 23, 2022 at 10:44 AM Roger Sayle > > > <ro...@nextmovesoftware.com> wrote: > > > > > > > > > > > > This is a ping of a patch from April (a dependency of another stage1 > > > > patch): > > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593123.html > > > > > > > > This patch has been refreshed/retested against gcc 13 trunk on > > > > x86_64-pc-linux-gnu with make bootstrap and make -k check, both > > > > with and without --target_board=unix{-m32}, with no new failures. > > > > Ok for mainline? > > > > > > I think this should be handled in a pre-reload splitter (or perhaps > > > combine splitter). We have so many variants of SSE/AVX logic > > > instructions that the transform after reload barely makes sense > > > (please see the number of regno checks in the proposed patch). > > > > > > Uros. > > > > > > > 2022-05-23 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > > > gcc/ChangeLog > > > > * config/i386/sse.md (peephole2): Convert suitable pand followed > > > > by pxor into pandn, i.e. (X&Y)^X into X & ~Y. > > > > > > > > Many thanks in advance, > > > > Roger > > > > -- > > > > > >