On Mon, May 23, 2022 at 12:49 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Hopefully, if I explain even more of the context, you'll better understand why
> this harmless (and at worse seemingly redundant) peephole2 is actually 
> critical
> for addressing significant regressions in the compiler without introducing new
> testsuite failures.  I wouldn't ask (again), if I didn't feel it's important.
>
> Basically, I'm trying to unblock Hongtao's patch (for PR target/104610)
> which in your own review, explained is better handled by/during STV:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594070.html
>
> Unfortunately, that patch of mine to STV (that I want to ping next) that 
> solves
> the P2 code quality regression PR target/70321, is itself blocked by another
> review of yours:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593200.html
> where this fix (alone) leads to a regression of the test case pr65105-5.c.

Is it possible to start with a STV patch? If there are only a few
introduced regressions, we can afford them in this stage of
development, and fix regressions later with a follow-up patches. THis
way, it is much easier for me to see the effect of the patch, and its
benefit can be weighted appropriately. I was indeed under the
impression that we try to peephole a combination that appears once in
a blue moon, but if the situation appears regularly, this is a
completely different matter.

> This pending regression has nothing to do with TARGET_BMI's andn, but
> the idiom "if ((x & y) != y)" on ia32, where x and y are DImode, and 
> stv/reload
> has decided to place these values in SSE registers.
>
> After combine we have an *anddi3_doubleword and *cmpdi3_doubleword:
> (insn 22 21 23 4 (parallel [
>             (set (reg:DI 97)
>                 (and:DI (reg/v:DI 92 [ p2 ])
>                     (reg:DI 88 [ _25 ])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr65105-5.c":20:18 530 {*anddi3_doubleword}
>      (expr_list:REG_UNUSED (reg:CC 17 flags)
>         (nil)))
> (insn 23 22 24 4 (set (reg:CCZ 17 flags)
>         (compare:CCZ (reg/v:DI 92 [ p2 ])
>             (reg:DI 97))) "pr65105-5.c":20:8 29 {*cmpdi_doubleword}
>      (expr_list:REG_DEAD (reg:DI 97)
>         (nil)))

One possible approach is to introduce intermediate compound (but
non-existent) instruction that is created by combine pass, and is
later split to real instructions. But a real testcase is needed, so
the correct strategy is used.

> After STV we have:
> (insn 22 21 45 4 (set (subreg:V2DI (reg:DI 97) 0)
>         (and:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0)
>             (subreg:V2DI (reg:DI 88 [ _25 ]) 0))) "pr65105-5.c":20:18 6640 
> {*andv2di3}
>      (expr_list:REG_UNUSED (reg:CC 17 flags)
>         (nil)))
> (insn 45 22 46 4 (set (reg:V2DI 103)
>         (xor:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0)
>             (subreg:V2DI (reg:DI 97) 0))) "pr65105-5.c":20:8 -1
>      (nil))
> (insn 46 45 23 4 (set (reg:V2DI 103)
>         (vec_select:V2DI (vec_concat:V4DI (reg:V2DI 103)
>                 (reg:V2DI 103))
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 2 [0x2])
>                 ]))) "pr65105-5.c":20:8 -1
>      (nil))
> (insn 23 46 24 4 (set (reg:CC 17 flags)
>         (unspec:CC [
>                 (reg:V2DI 103) repeated x2
>             ] UNSPEC_PTEST)) "pr65105-5.c":20:8 7425 {sse4_1_ptestv2di}
>      (expr_list:REG_DEAD (reg:DI 97)
>         (nil)))
>
> where the XOR has been introduce to implement the equality,
> as P == Q is effectively implemented as (P ^ Q) == 0.  At this point,
> the only remaining pass that can optimize the pand followed by
> the pxor is peephole2.
>
> The requirement to optimize this is from gcc.target/i386/pr65105-5.c
> where the desired implementation is explicitly looking for pandn+ptest:
>
> /* { dg-do compile { target ia32 } } */
> /* { dg-options "-O2 -march=core-avx2 -mno-stackrealign" } */
> /* { dg-final { scan-assembler "pandn" } } */
> /* { dg-final { scan-assembler "pxor" } } */
> /* { dg-final { scan-assembler "ptest" } } */
>
>
> Confusingly, I've even more patches in the queue/backlog for this part
> of the compiler (it's an air traffic control problem, fallout from stage 4).
>
> And of course, very many thanks for the various andn related patches
> that have already been approved/committed to the backend, to avoid
> potential regressions related to code size (-Os and -Oz).  It's a long road
> with many steps.
>
> Might you reconsider?  Pretty  please?

No problem for me, but the testcase would really help.

Uros.

Reply via email to