On Thu, Jun 2, 2022 at 10:00 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> Hi!
>
> As the following testcase shows, our x86 backend support for optimizing
> out useless masking of shift/rotate counts when using instructions
> that naturally modulo the count themselves is insufficient.
> The *_mask define_insn_and_split patterns use
> (subreg:QI (and:SI (match_operand:SI) (match_operand "const_int_operand")))
> for the masking, but that can catch only the case where the masking
> is done in SImode, so typically in SImode in the source.
> We then have another set of patterns, *_mask_1, which use
> (and:QI (match_operand:QI) (match_operand "const_int_operand"))
> If the masking is done in DImode or in theory in HImode, we don't match
> it.
> The following patch does 4 different things to improve this:
> 1) drops the mode from AND and MATCH_OPERAND inside of the subreg:QI
>    and replaces that by checking that the register shift count has
>    SWI48 mode - I think doing it this way is cheaper than adding
>    another mode iterator to patterns which use already another mode
>    iterator and sometimes a code iterator as well
> 2) the doubleword shift patterns were only handling the case where
>    the shift count is masked with a constant that has the most significant
>    bit clear, i.e. where we know the shift count is less than half the
>    number of bits in double-word.  If the mask is equal to half the
>    number of bits in double-word minus 1, the masking was optimized
>    away, otherwise the AND was kept.
>    But if the most significant bit isn't clear, e use a word-sized shift
>    and SHRD instruction, where the former does the modulo and the latter
>    modulo with 64 / 32 depending on what mode the CPU is in (so 64 for
>    128-bit doubleword and 32 or 64-bit doubleword).  So we can also
>    optimize away the masking when the mask has all the relevant bits set,
>    masking with the most significant bit will remain for the cmove
>    test.
> 3) as requested, this patch adds a bunch of force_reg calls before
>    gen_lowpart
> 4) 1-3 above unfortunately regressed
>    +FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t]
>    +FAIL: gcc.target/i386/pr57819.c scan-assembler-not and[lq][ \\t]
>    where we during combine match the new pattern we didn't match
>    before and in the end don't match the pattern we were testing for.
>    These 2 tests are fixed by the *jcc_bt<mode>_mask_1 pattern
>    addition and small tweak to target rtx_costs, because even with
>    the pattern around we'd refuse to match it because it appeared to
>    have higher instruction cost
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-06-02  Jakub Jelinek  <ja...@redhat.com>
>
>         PR target/105778
>         * config/i386/i386.md (*ashl<dwi>3_doubleword_mask): Remove :SI
>         from AND and its operands and just verify operands[2] has HImode,
>         SImode or for TARGET_64BIT DImode.  Allow operands[3] to be a mask
>         with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
>         just throw away the masking.  Use force_reg before calling
>         gen_lowpart.
>         (*ashl<dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
>         with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
>         just throw away the masking.
>         (*ashl<mode>3_doubleword): Rename to ...
>         (ashl<mode>3_doubleword): ... this.
>         (*ashl<mode>3_mask): Remove :SI from AND and its operands and just
>         verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
>         Use force_reg before calling gen_lowpart.
>         (*<insn><mode>3_mask): Likewise.
>         (*<insn><dwi>3_doubleword_mask): Likewise.  Allow operands[3] to be
>         a mask with all low 6 (64-bit) or 5 (32-bit) bits set and in that
>         case just throw away the masking.  Use force_reg before calling
>         gen_lowpart.
>         (*<insn><dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
>         with all low 6 (64-bit) or 5 (32-bit) bits set and in that case just
>         throw away the masking.
>         (*<insn><mode>3_doubleword): Rename to ...
>         (<insn><mode>3_doubleword): ... this.
>         (*<insn><mode>3_mask): Remove :SI from AND and its operands and just
>         verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
>         Use force_reg before calling gen_lowpart.
>         (splitter after it): Remove :SI from AND and its operands and just
>         verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
>         (*<btsc><mode>_mask, *<btsc><mode>_mask): Remove :SI from AND and its
>         operands and just verify operands[1] has HImode, SImode or for
>         TARGET_64BIT DImode.  Use force_reg before calling gen_lowpart.
>         (*jcc_bt<mode>_mask_1): New define_insn_and_split pattern.
>         * config/i386/i386.cc (ix86_rtx_costs): For ZERO_EXTRACT with
>         ZERO_EXTEND QI->SI in last operand ignore the cost of the ZERO_EXTEND.
>
>         * gcc.target/i386/pr105778.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.md.jj  2022-05-31 11:33:51.457251607 +0200
> +++ gcc/config/i386/i386.md     2022-06-01 11:59:27.388631872 +0200
> @@ -11890,11 +11890,16 @@ (define_insn_and_split "*ashl<dwi>3_doub
>         (ashift:<DWI>
>           (match_operand:<DWI> 1 "register_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> -   (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +           (and
> +             (match_operand 2 "register_operand" "c")
> +             (match_operand 3 "const_int_operand")) 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -11912,6 +11917,15 @@ (define_insn_and_split "*ashl<dwi>3_doub
>            (ashift:DWIH (match_dup 5) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +      operands[2] = gen_lowpart (QImode, operands[2]);
> +      emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
> +                                           operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -11925,6 +11939,7 @@ (define_insn_and_split "*ashl<dwi>3_doub
>        operands[2] = tem;
>      }
>
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
>    operands[2] = gen_lowpart (QImode, operands[2]);
>
>    if (!rtx_equal_p (operands[6], operands[7]))
> @@ -11939,7 +11954,9 @@ (define_insn_and_split "*ashl<dwi>3_doub
>             (match_operand:QI 2 "register_operand" "c")
>             (match_operand:QI 3 "const_int_operand"))))
>     (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -11957,6 +11974,13 @@ (define_insn_and_split "*ashl<dwi>3_doub
>            (ashift:DWIH (match_dup 5) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
> +                                           operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -11974,7 +11998,7 @@ (define_insn_and_split "*ashl<dwi>3_doub
>      emit_move_insn (operands[6], operands[7]);
>  })
>
> -(define_insn "*ashl<mode>3_doubleword"
> +(define_insn "ashl<mode>3_doubleword"
>    [(set (match_operand:DWI 0 "register_operand" "=&r")
>         (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n")
>                     (match_operand:QI 2 "nonmemory_operand" "<S>c")))
> @@ -12186,13 +12210,16 @@ (define_insn_and_split "*ashl<mode>3_mas
>         (ashift:SWI48
>           (match_operand:SWI48 1 "nonimmediate_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c,r")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> +           (and
> +             (match_operand 2 "register_operand" "c,r")
> +             (match_operand 3 "const_int_operand")) 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)
>     && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12201,7 +12228,10 @@ (define_insn_and_split "*ashl<mode>3_mas
>            (ashift:SWI48 (match_dup 1)
>                          (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[2] = gen_lowpart (QImode, operands[2]);"
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (QImode, operands[2]);
> +}
>    [(set_attr "isa" "*,bmi2")])
>
>  (define_insn_and_split "*ashl<mode>3_mask_1"
> @@ -12774,13 +12804,16 @@ (define_insn_and_split "*<insn><mode>3_m
>         (any_shiftrt:SWI48
>           (match_operand:SWI48 1 "nonimmediate_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c,r")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> +           (and
> +             (match_operand 2 "register_operand" "c,r")
> +             (match_operand 3 "const_int_operand")) 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
>     && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12789,7 +12822,10 @@ (define_insn_and_split "*<insn><mode>3_m
>            (any_shiftrt:SWI48 (match_dup 1)
>                               (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[2] = gen_lowpart (QImode, operands[2]);"
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (QImode, operands[2]);
> +}
>    [(set_attr "isa" "*,bmi2")])
>
>  (define_insn_and_split "*<insn><mode>3_mask_1"
> @@ -12819,11 +12855,16 @@ (define_insn_and_split "*<insn><dwi>3_do
>         (any_shiftrt:<DWI>
>           (match_operand:<DWI> 1 "register_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> -   (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +           (and
> +             (match_operand 2 "register_operand" "c")
> +             (match_operand 3 "const_int_operand")) 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12841,6 +12882,15 @@ (define_insn_and_split "*<insn><dwi>3_do
>            (any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +      operands[2] = gen_lowpart (QImode, operands[2]);
> +      emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1],
> +                                             operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -12854,6 +12904,7 @@ (define_insn_and_split "*<insn><dwi>3_do
>        operands[2] = tem;
>      }
>
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
>    operands[2] = gen_lowpart (QImode, operands[2]);
>
>    if (!rtx_equal_p (operands[4], operands[5]))
> @@ -12868,7 +12919,9 @@ (define_insn_and_split "*<insn><dwi>3_do
>             (match_operand:QI 2 "register_operand" "c")
>             (match_operand:QI 3 "const_int_operand"))))
>     (clobber (reg:CC FLAGS_REG))]
> -  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +  "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
> +    || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))
> +        == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -12886,6 +12939,13 @@ (define_insn_and_split "*<insn><dwi>3_do
>            (any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> +  if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0)
> +    {
> +      emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1],
> +                                             operands[2]));
> +      DONE;
> +    }
> +
>    split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
>
>    operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
> @@ -12903,7 +12963,7 @@ (define_insn_and_split "*<insn><dwi>3_do
>      emit_move_insn (operands[4], operands[5]);
>  })
>
> -(define_insn_and_split "*<insn><mode>3_doubleword"
> +(define_insn_and_split "<insn><mode>3_doubleword"
>    [(set (match_operand:DWI 0 "register_operand" "=&r")
>         (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
>                          (match_operand:QI 2 "nonmemory_operand" "<S>c")))
> @@ -13586,13 +13646,16 @@ (define_insn_and_split "*<insn><mode>3_m
>         (any_rotate:SWI
>           (match_operand:SWI 1 "nonimmediate_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand" "c")
> -             (match_operand:SI 3 "const_int_operand")) 0)))
> +           (and
> +             (match_operand 2 "register_operand" "c")
> +             (match_operand 3 "const_int_operand")) 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
>     && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -13601,18 +13664,24 @@ (define_insn_and_split "*<insn><mode>3_m
>            (any_rotate:SWI (match_dup 1)
>                            (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[2] = gen_lowpart (QImode, operands[2]);")
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (QImode, operands[2]);
> +})
>
>  (define_split
>    [(set (match_operand:SWI 0 "register_operand")
>         (any_rotate:SWI
>           (match_operand:SWI 1 "const_int_operand")
>           (subreg:QI
> -           (and:SI
> -             (match_operand:SI 2 "register_operand")
> -             (match_operand:SI 3 "const_int_operand")) 0)))]
> +           (and
> +             (match_operand 2 "register_operand")
> +             (match_operand 3 "const_int_operand")) 0)))]
>   "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode) - 1))
> -   == GET_MODE_BITSIZE (<MODE>mode) - 1"
> +   == GET_MODE_BITSIZE (<MODE>mode) - 1
> +  && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +  && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +              4 << (TARGET_64BIT ? 1 : 0))"
>   [(set (match_dup 4) (match_dup 1))
>    (set (match_dup 0)
>         (any_rotate:SWI (match_dup 4)
> @@ -13976,14 +14045,17 @@ (define_insn_and_split "*<btsc><mode>_ma
>           (ashift:SWI48
>             (const_int 1)
>             (subreg:QI
> -             (and:SI
> -               (match_operand:SI 1 "register_operand")
> -               (match_operand:SI 2 "const_int_operand")) 0))
> +             (and
> +               (match_operand 1 "register_operand")
> +               (match_operand 2 "const_int_operand")) 0))
>           (match_operand:SWI48 3 "register_operand")))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_USE_BT
>     && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -13994,7 +14066,10 @@ (define_insn_and_split "*<btsc><mode>_ma
>                            (match_dup 1))
>              (match_dup 3)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[1] = gen_lowpart (QImode, operands[1]);")
> +{
> +  operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);
> +  operands[1] = gen_lowpart (QImode, operands[1]);
> +})
>
>  (define_insn_and_split "*<btsc><mode>_mask_1"
>    [(set (match_operand:SWI48 0 "register_operand")
> @@ -14041,14 +14116,17 @@ (define_insn_and_split "*btr<mode>_mask"
>           (rotate:SWI48
>             (const_int -2)
>             (subreg:QI
> -             (and:SI
> -               (match_operand:SI 1 "register_operand")
> -               (match_operand:SI 2 "const_int_operand")) 0))
> +             (and
> +               (match_operand 1 "register_operand")
> +               (match_operand 2 "const_int_operand")) 0))
>           (match_operand:SWI48 3 "register_operand")))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_USE_BT
>     && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
>        == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
>     && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
> @@ -14059,7 +14137,10 @@ (define_insn_and_split "*btr<mode>_mask"
>                            (match_dup 1))
>              (match_dup 3)))
>        (clobber (reg:CC FLAGS_REG))])]
> -  "operands[1] = gen_lowpart (QImode, operands[1]);")
> +{
> +  operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);
> +  operands[1] = gen_lowpart (QImode, operands[1]);
> +})
>
>  (define_insn_and_split "*btr<mode>_mask_1"
>    [(set (match_operand:SWI48 0 "register_operand")
> @@ -14409,6 +14490,47 @@ (define_insn_and_split "*jcc_bt<mode>_ma
>    operands[0] = shallow_copy_rtx (operands[0]);
>    PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
>  })
> +
> +(define_insn_and_split "*jcc_bt<mode>_mask_1"
> +  [(set (pc)
> +       (if_then_else (match_operator 0 "bt_comparison_operator"
> +                       [(zero_extract:SWI48
> +                          (match_operand:SWI48 1 "register_operand")
> +                          (const_int 1)
> +                          (zero_extend:SI
> +                            (subreg:QI
> +                              (and
> +                                (match_operand 2 "register_operand")
> +                                (match_operand 3 "const_int_operand")) 0)))])
> +                     (label_ref (match_operand 4))
> +                     (pc)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
> +      == GET_MODE_BITSIZE (<MODE>mode)-1
> +   && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT
> +   && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2,
> +               4 << (TARGET_64BIT ? 1 : 0))
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (reg:CCC FLAGS_REG)
> +       (compare:CCC
> +         (zero_extract:SWI48
> +           (match_dup 1)
> +           (const_int 1)
> +           (match_dup 2))
> +         (const_int 0)))
> +   (set (pc)
> +       (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
> +                     (label_ref (match_dup 4))
> +                     (pc)))]
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (SImode, operands[2]);
> +  operands[0] = shallow_copy_rtx (operands[0]);
> +  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
> +})
>
>  ;; Help combine recognize bt followed by cmov
>  (define_split
> --- gcc/config/i386/i386.cc.jj  2022-05-31 11:33:51.452251660 +0200
> +++ gcc/config/i386/i386.cc     2022-06-01 12:40:06.189186012 +0200
> @@ -20995,6 +20995,20 @@ ix86_rtx_costs (rtx x, machine_mode mode
>          *total += 1;
>        return false;
>
> +    case ZERO_EXTRACT:
> +      if (XEXP (x, 1) == const1_rtx
> +         && GET_CODE (XEXP (x, 2)) == ZERO_EXTEND
> +         && GET_MODE (XEXP (x, 2)) == SImode
> +         && GET_MODE (XEXP (XEXP (x, 2), 0)) == QImode)
> +       {
> +         /* Ignore cost of zero extension and masking of last argument.  */
> +         *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed);
> +         *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed);
> +         *total += rtx_cost (XEXP (XEXP (x, 2), 0), mode, code, 2, speed);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        return false;
>      }
> --- gcc/testsuite/gcc.target/i386/pr105778.c.jj 2022-05-31 13:59:12.470814609 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr105778.c    2022-05-31 13:58:50.624044700 
> +0200
> @@ -0,0 +1,45 @@
> +/* PR target/105778 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "\tand\[^\n\r]*\(31\|63\|127\|255\)" } } 
> */
> +
> +unsigned int f1 (unsigned int x, unsigned long y) { y &= 31; return x << y; }
> +unsigned int f2 (unsigned int x, unsigned long y) { return x << (y & 31); }
> +unsigned int f3 (unsigned int x, unsigned long y) { y &= 31; return x >> y; }
> +unsigned int f4 (unsigned int x, unsigned long y) { return x >> (y & 31); }
> +int f5 (int x, unsigned long y) { y &= 31; return x >> y; }
> +int f6 (int x, unsigned long y) { return x >> (y & 31); }
> +unsigned long long f7 (unsigned long long x, unsigned long y) { y &= 63; 
> return x << y; }
> +unsigned long long f8 (unsigned long long x, unsigned long y) { return x << 
> (y & 63); }
> +unsigned long long f9 (unsigned long long x, unsigned long y) { y &= 63; 
> return x >> y; }
> +unsigned long long f10 (unsigned long long x, unsigned long y) { return x >> 
> (y & 63); }
> +long long f11 (long long x, unsigned long y) { y &= 63; return x >> y; }
> +long long f12 (long long x, unsigned long y) { return x >> (y & 63); }
> +#ifdef __SIZEOF_INT128__
> +unsigned __int128 f13 (unsigned __int128 x, unsigned long y) { y &= 127; 
> return x << y; }
> +unsigned __int128 f14 (unsigned __int128 x, unsigned long y) { return x << 
> (y & 127); }
> +unsigned __int128 f15 (unsigned __int128 x, unsigned long y) { y &= 127; 
> return x >> y; }
> +unsigned __int128 f16 (unsigned __int128 x, unsigned long y) { return x >> 
> (y & 127); }
> +__int128 f17 (__int128 x, unsigned long y) { y &= 127; return x >> y; }
> +__int128 f18 (__int128 x, unsigned long y) { return x >> (y & 127); }
> +#endif
> +unsigned int f19 (unsigned int x, unsigned long y) { y &= 63; return x << y; 
> }
> +unsigned int f20 (unsigned int x, unsigned long y) { return x << (y & 63); }
> +unsigned int f21 (unsigned int x, unsigned long y) { y &= 63; return x >> y; 
> }
> +unsigned int f22 (unsigned int x, unsigned long y) { return x >> (y & 63); }
> +int f23 (int x, unsigned long y) { y &= 63; return x >> y; }
> +int f24 (int x, unsigned long y) { return x >> (y & 63); }
> +unsigned long long f25 (unsigned long long x, unsigned long y) { y &= 127; 
> return x << y; }
> +unsigned long long f26 (unsigned long long x, unsigned long y) { return x << 
> (y & 127); }
> +unsigned long long f27 (unsigned long long x, unsigned long y) { y &= 127; 
> return x >> y; }
> +unsigned long long f28 (unsigned long long x, unsigned long y) { return x >> 
> (y & 127); }
> +long long f29 (long long x, unsigned long y) { y &= 127; return x >> y; }
> +long long f30 (long long x, unsigned long y) { return x >> (y & 127); }
> +#ifdef __SIZEOF_INT128__
> +unsigned __int128 f31 (unsigned __int128 x, unsigned long y) { y &= 255; 
> return x << y; }
> +unsigned __int128 f32 (unsigned __int128 x, unsigned long y) { return x << 
> (y & 255); }
> +unsigned __int128 f33 (unsigned __int128 x, unsigned long y) { y &= 255; 
> return x >> y; }
> +unsigned __int128 f34 (unsigned __int128 x, unsigned long y) { return x >> 
> (y & 255); }
> +__int128 f35 (__int128 x, unsigned long y) { y &= 255; return x >> y; }
> +__int128 f36 (__int128 x, unsigned long y) { return x >> (y & 255); }
> +#endif
>
>         Jakub
>

Reply via email to