On Mon, Aug 7, 2023 at 9:37 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch is a partial solution to PR target/107671, updating Uros'
> patch from comment #4, to catch both bit set (setc) and bit not set
> (setnc) cases from the code in comment #2, when compiled on x86_64.
> Unfortunately, this is a partial solution, as the pointer variants
> in comment #1, aren't yet all optimized, and my attempts to check
> whether the 32-bit versions are optimized with -m32 revealed they
> also need further improvement.  (Some of) These remaining issues
> might best be fixed in the middle-end, in either match.pd or the
> RTL optimizers, so I thought it reasonable to submit this independent
> backend piece, and gain/bank the improvements on x86_64.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-08-07  Roger Sayle  <ro...@nextmovesoftware.com>
>             Uros Bizjak  <ubiz...@gmail.com>
>
> gcc/ChangeLog
>         PR target/107671
>         * config/i386/i386.md (*bt<mode>_setc<mode>_mask): Allow the
>         shift count to have a different mode (using SWI248) from either
>         the bit-test or the result.
>         (*bt<mode>_setnc<mode>_mask): New define_insn_and_split for the
>         setnc (bit not set) case of the above pattern.
>         (*btdi_setncsi_mask): New define_insn_and_split to handle the
>         SImode result from a DImode bit-test variant of the above patterns.
>         (*bt<mode>_setncqi_mask_2): New define_insn_and_split for the
>         setnc (bit not set) version of *bt<mode>_setcqi_mask_2.
>
> gcc/testsuite/ChangeLog
>         PR target/107671
>         * gcc.target/i386/pr107671-1.c: New test case.
>         * gcc.target/i386/pr107671-2.c: Likewise.

I am worried about the number of existing and new patterns that are
introduced to satisfy creativity of the combine pass. The following
can be handled via zero-extract RTXes:

    return ((v & (1  << (bitnum & 31)))) != 0;
    return ((v & (1L << (bitnum & 63)))) != 0;
    return (v >> (bitnum & 31)) & 1;
    return (v >> (bitnum & 63)) & 1;

but there is no canonicalization for negative forms of the above constructs.

For the above, the combine pass tries:

(set (reg:SI 95)
    (zero_extract:SI (reg:SI 97)
        (const_int 1 [0x1])
        (and:SI (reg:SI 98)
            (const_int 31 [0x1f]))))

that is necessary to handle the change of compare mode from CCZ to
CCC. However, negative forms try:

(set (reg:QI 96)
    (eq:QI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 0 [0])))

and:

(set (reg:SI 95)
    (xor:SI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 1 [0x1])))

and these are further different for SImode and DImode.

Ideally, we would simplify all forms to:

(set (reg:QI 96)
    (eq:QI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 0 [0])))

where inverted/non-inverted forms would emit ne/eq:QI (....)
(const_int 0). The result would be zero-extended to DI or SImode,
depending on the target mode.

You can already see the problem with missing canonicalization in
i386.md with define_insn_and_split pattern with comments:

;; Help combine recognize bt followed by setc
;; Help combine recognize bt followed by setnc

where totally different patterns are needed to match what combine produces.

The above problem is specific to setcc patterns, where output value is
derived from the input operand. jcc and cmov look OK.

If the following pattern would be tried by combine, then we would
handle all the testcases in the PR (plus some more, where output is in
different mode than input):

(set (reg:QI 96)
    ({eq,ne}:QI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 0 [0])))

where QIreg is later zero-extended to the target width. In this case,
one define_insn_and_split pattern would handle all testcases from
PR107671.

Please also note that we can implement this transformation via a
combine splitter. The benefit of the combine splitter is that its
results are immediately "recognized", and new RTXes can be propagated
into subsequent combinations.

I have made some measurements with my proposed patch (as posted in the
PR), and the transformation never triggered (neither for gcc build,
nor when building the linux kernel). So, I wonder if the added number
of patterns outweigh the benefits at all.

IMO, the correct way is to teach the combine pass some more about
bit-test functionality (to also pass negative forms via zero-extract
RTXes, see above). This would canonicalize the transformation and
prevent pattern explosion.

I consider bt to be quite an important instruction, where one
instruction can substitute a bunch of shift/and/move insns. Perhaps a
separate tree or RTX code would enable optimizations, other than those
that combine can offer.

[The above is excerpt of some private communication I had with Roger,
now published for archival purposes]

Uros.

>
> Roger
> --
>

Reply via email to