On Thu, Jun 2, 2022 at 10:00 AM Jakub Jelinek <ja...@redhat.com> wrote: > > Hi! > > As the following testcase shows, our x86 backend support for optimizing > out useless masking of shift/rotate counts when using instructions > that naturally modulo the count themselves is insufficient. > The *_mask define_insn_and_split patterns use > (subreg:QI (and:SI (match_operand:SI) (match_operand "const_int_operand"))) > for the masking, but that can catch only the case where the masking > is done in SImode, so typically in SImode in the source. > We then have another set of patterns, *_mask_1, which use > (and:QI (match_operand:QI) (match_operand "const_int_operand")) > If the masking is done in DImode or in theory in HImode, we don't match > it. > The following patch does 4 different things to improve this: > 1) drops the mode from AND and MATCH_OPERAND inside of the subreg:QI > and replaces that by checking that the register shift count has > SWI48 mode - I think doing it this way is cheaper than adding > another mode iterator to patterns which use already another mode > iterator and sometimes a code iterator as well > 2) the doubleword shift patterns were only handling the case where > the shift count is masked with a constant that has the most significant > bit clear, i.e. where we know the shift count is less than half the > number of bits in double-word. If the mask is equal to half the > number of bits in double-word minus 1, the masking was optimized > away, otherwise the AND was kept. > But if the most significant bit isn't clear, e use a word-sized shift > and SHRD instruction, where the former does the modulo and the latter > modulo with 64 / 32 depending on what mode the CPU is in (so 64 for > 128-bit doubleword and 32 or 64-bit doubleword). So we can also > optimize away the masking when the mask has all the relevant bits set, > masking with the most significant bit will remain for the cmove > test. > 3) as requested, this patch adds a bunch of force_reg calls before > gen_lowpart > 4) 1-3 above unfortunately regressed > +FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t] > +FAIL: gcc.target/i386/pr57819.c scan-assembler-not and[lq][ \\t] > where we during combine match the new pattern we didn't match > before and in the end don't match the pattern we were testing for. > These 2 tests are fixed by the *jcc_bt<mode>_mask_1 pattern > addition and small tweak to target rtx_costs, because even with > the pattern around we'd refuse to match it because it appeared to > have higher instruction cost > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2022-06-02 Jakub Jelinek <ja...@redhat.com> > > PR target/105778 > * config/i386/i386.md (*ashl<dwi>3_doubleword_mask): Remove :SI > from AND and its operands and just verify operands[2] has HImode, > SImode or for TARGET_64BIT DImode. Allow operands[3] to be a mask > with all low 6 (64-bit) or 5 (32-bit) bits set and in that case > just throw away the masking. Use force_reg before calling > gen_lowpart. > (*ashl<dwi>3_doubleword_mask_1): Allow operands[3] to be a mask > with all low 6 (64-bit) or 5 (32-bit) bits set and in that case > just throw away the masking. > (*ashl<mode>3_doubleword): Rename to ... > (ashl<mode>3_doubleword): ... this. > (*ashl<mode>3_mask): Remove :SI from AND and its operands and just > verify operands[2] has HImode, SImode or for TARGET_64BIT DImode. > Use force_reg before calling gen_lowpart. > (*<insn><mode>3_mask): Likewise. > (*<insn><dwi>3_doubleword_mask): Likewise. Allow operands[3] to be > a mask with all low 6 (64-bit) or 5 (32-bit) bits set and in that > case just throw away the masking. Use force_reg before calling > gen_lowpart. > (*<insn><dwi>3_doubleword_mask_1): Allow operands[3] to be a mask > with all low 6 (64-bit) or 5 (32-bit) bits set and in that case just > throw away the masking. > (*<insn><mode>3_doubleword): Rename to ... > (<insn><mode>3_doubleword): ... this. > (*<insn><mode>3_mask): Remove :SI from AND and its operands and just > verify operands[2] has HImode, SImode or for TARGET_64BIT DImode. > Use force_reg before calling gen_lowpart. > (splitter after it): Remove :SI from AND and its operands and just > verify operands[2] has HImode, SImode or for TARGET_64BIT DImode. > (*<btsc><mode>_mask, *<btsc><mode>_mask): Remove :SI from AND and its > operands and just verify operands[1] has HImode, SImode or for > TARGET_64BIT DImode. Use force_reg before calling gen_lowpart. > (*jcc_bt<mode>_mask_1): New define_insn_and_split pattern. > * config/i386/i386.cc (ix86_rtx_costs): For ZERO_EXTRACT with > ZERO_EXTEND QI->SI in last operand ignore the cost of the ZERO_EXTEND. > > * gcc.target/i386/pr105778.c: New test.
OK. Thanks, Uros. > > --- gcc/config/i386/i386.md.jj 2022-05-31 11:33:51.457251607 +0200 > +++ gcc/config/i386/i386.md 2022-06-01 11:59:27.388631872 +0200 > @@ -11890,11 +11890,16 @@ (define_insn_and_split "*ashl<dwi>3_doub > (ashift:<DWI> > (match_operand:<DWI> 1 "register_operand") > (subreg:QI > - (and:SI > - (match_operand:SI 2 "register_operand" "c") > - (match_operand:SI 3 "const_int_operand")) 0))) > - (clobber (reg:CC FLAGS_REG))] > - "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + (and > + (match_operand 2 "register_operand" "c") > + (match_operand 3 "const_int_operand")) 0))) > + (clobber (reg:CC FLAGS_REG))] > + "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)) > + == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))) > + && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -11912,6 +11917,15 @@ (define_insn_and_split "*ashl<dwi>3_doub > (ashift:DWIH (match_dup 5) (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > { > + if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0) > + { > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (QImode, operands[2]); > + emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1], > + operands[2])); > + DONE; > + } > + > split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]); > > operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1); > @@ -11925,6 +11939,7 @@ (define_insn_and_split "*ashl<dwi>3_doub > operands[2] = tem; > } > > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > operands[2] = gen_lowpart (QImode, operands[2]); > > if (!rtx_equal_p (operands[6], operands[7])) > @@ -11939,7 +11954,9 @@ (define_insn_and_split "*ashl<dwi>3_doub > (match_operand:QI 2 "register_operand" "c") > (match_operand:QI 3 "const_int_operand")))) > (clobber (reg:CC FLAGS_REG))] > - "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)) > + == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -11957,6 +11974,13 @@ (define_insn_and_split "*ashl<dwi>3_doub > (ashift:DWIH (match_dup 5) (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > { > + if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0) > + { > + emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1], > + operands[2])); > + DONE; > + } > + > split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]); > > operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1); > @@ -11974,7 +11998,7 @@ (define_insn_and_split "*ashl<dwi>3_doub > emit_move_insn (operands[6], operands[7]); > }) > > -(define_insn "*ashl<mode>3_doubleword" > +(define_insn "ashl<mode>3_doubleword" > [(set (match_operand:DWI 0 "register_operand" "=&r") > (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n") > (match_operand:QI 2 "nonmemory_operand" "<S>c"))) > @@ -12186,13 +12210,16 @@ (define_insn_and_split "*ashl<mode>3_mas > (ashift:SWI48 > (match_operand:SWI48 1 "nonimmediate_operand") > (subreg:QI > - (and:SI > - (match_operand:SI 2 "register_operand" "c,r") > - (match_operand:SI 3 "const_int_operand")) 0))) > + (and > + (match_operand 2 "register_operand" "c,r") > + (match_operand 3 "const_int_operand")) 0))) > (clobber (reg:CC FLAGS_REG))] > "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands) > && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) > == GET_MODE_BITSIZE (<MODE>mode)-1 > + && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -12201,7 +12228,10 @@ (define_insn_and_split "*ashl<mode>3_mas > (ashift:SWI48 (match_dup 1) > (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > - "operands[2] = gen_lowpart (QImode, operands[2]);" > +{ > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (QImode, operands[2]); > +} > [(set_attr "isa" "*,bmi2")]) > > (define_insn_and_split "*ashl<mode>3_mask_1" > @@ -12774,13 +12804,16 @@ (define_insn_and_split "*<insn><mode>3_m > (any_shiftrt:SWI48 > (match_operand:SWI48 1 "nonimmediate_operand") > (subreg:QI > - (and:SI > - (match_operand:SI 2 "register_operand" "c,r") > - (match_operand:SI 3 "const_int_operand")) 0))) > + (and > + (match_operand 2 "register_operand" "c,r") > + (match_operand 3 "const_int_operand")) 0))) > (clobber (reg:CC FLAGS_REG))] > "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) > && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) > == GET_MODE_BITSIZE (<MODE>mode)-1 > + && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -12789,7 +12822,10 @@ (define_insn_and_split "*<insn><mode>3_m > (any_shiftrt:SWI48 (match_dup 1) > (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > - "operands[2] = gen_lowpart (QImode, operands[2]);" > +{ > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (QImode, operands[2]); > +} > [(set_attr "isa" "*,bmi2")]) > > (define_insn_and_split "*<insn><mode>3_mask_1" > @@ -12819,11 +12855,16 @@ (define_insn_and_split "*<insn><dwi>3_do > (any_shiftrt:<DWI> > (match_operand:<DWI> 1 "register_operand") > (subreg:QI > - (and:SI > - (match_operand:SI 2 "register_operand" "c") > - (match_operand:SI 3 "const_int_operand")) 0))) > - (clobber (reg:CC FLAGS_REG))] > - "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + (and > + (match_operand 2 "register_operand" "c") > + (match_operand 3 "const_int_operand")) 0))) > + (clobber (reg:CC FLAGS_REG))] > + "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)) > + == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))) > + && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -12841,6 +12882,15 @@ (define_insn_and_split "*<insn><dwi>3_do > (any_shiftrt:DWIH (match_dup 7) (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > { > + if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0) > + { > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (QImode, operands[2]); > + emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1], > + operands[2])); > + DONE; > + } > + > split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]); > > operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1); > @@ -12854,6 +12904,7 @@ (define_insn_and_split "*<insn><dwi>3_do > operands[2] = tem; > } > > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > operands[2] = gen_lowpart (QImode, operands[2]); > > if (!rtx_equal_p (operands[4], operands[5])) > @@ -12868,7 +12919,9 @@ (define_insn_and_split "*<insn><dwi>3_do > (match_operand:QI 2 "register_operand" "c") > (match_operand:QI 3 "const_int_operand")))) > (clobber (reg:CC FLAGS_REG))] > - "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + "((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0 > + || ((INTVAL (operands[3]) & (2 * <MODE_SIZE> * BITS_PER_UNIT - 1)) > + == (2 * <MODE_SIZE> * BITS_PER_UNIT - 1))) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -12886,6 +12939,13 @@ (define_insn_and_split "*<insn><dwi>3_do > (any_shiftrt:DWIH (match_dup 7) (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > { > + if ((INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) != 0) > + { > + emit_insn (gen_<insn><dwi>3_doubleword (operands[0], operands[1], > + operands[2])); > + DONE; > + } > + > split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]); > > operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1); > @@ -12903,7 +12963,7 @@ (define_insn_and_split "*<insn><dwi>3_do > emit_move_insn (operands[4], operands[5]); > }) > > -(define_insn_and_split "*<insn><mode>3_doubleword" > +(define_insn_and_split "<insn><mode>3_doubleword" > [(set (match_operand:DWI 0 "register_operand" "=&r") > (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0") > (match_operand:QI 2 "nonmemory_operand" "<S>c"))) > @@ -13586,13 +13646,16 @@ (define_insn_and_split "*<insn><mode>3_m > (any_rotate:SWI > (match_operand:SWI 1 "nonimmediate_operand") > (subreg:QI > - (and:SI > - (match_operand:SI 2 "register_operand" "c") > - (match_operand:SI 3 "const_int_operand")) 0))) > + (and > + (match_operand 2 "register_operand" "c") > + (match_operand 3 "const_int_operand")) 0))) > (clobber (reg:CC FLAGS_REG))] > "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) > && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) > == GET_MODE_BITSIZE (<MODE>mode)-1 > + && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -13601,18 +13664,24 @@ (define_insn_and_split "*<insn><mode>3_m > (any_rotate:SWI (match_dup 1) > (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > - "operands[2] = gen_lowpart (QImode, operands[2]);") > +{ > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (QImode, operands[2]); > +}) > > (define_split > [(set (match_operand:SWI 0 "register_operand") > (any_rotate:SWI > (match_operand:SWI 1 "const_int_operand") > (subreg:QI > - (and:SI > - (match_operand:SI 2 "register_operand") > - (match_operand:SI 3 "const_int_operand")) 0)))] > + (and > + (match_operand 2 "register_operand") > + (match_operand 3 "const_int_operand")) 0)))] > "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode) - 1)) > - == GET_MODE_BITSIZE (<MODE>mode) - 1" > + == GET_MODE_BITSIZE (<MODE>mode) - 1 > + && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2, > + 4 << (TARGET_64BIT ? 1 : 0))" > [(set (match_dup 4) (match_dup 1)) > (set (match_dup 0) > (any_rotate:SWI (match_dup 4) > @@ -13976,14 +14045,17 @@ (define_insn_and_split "*<btsc><mode>_ma > (ashift:SWI48 > (const_int 1) > (subreg:QI > - (and:SI > - (match_operand:SI 1 "register_operand") > - (match_operand:SI 2 "const_int_operand")) 0)) > + (and > + (match_operand 1 "register_operand") > + (match_operand 2 "const_int_operand")) 0)) > (match_operand:SWI48 3 "register_operand"))) > (clobber (reg:CC FLAGS_REG))] > "TARGET_USE_BT > && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) > == GET_MODE_BITSIZE (<MODE>mode)-1 > + && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -13994,7 +14066,10 @@ (define_insn_and_split "*<btsc><mode>_ma > (match_dup 1)) > (match_dup 3))) > (clobber (reg:CC FLAGS_REG))])] > - "operands[1] = gen_lowpart (QImode, operands[1]);") > +{ > + operands[1] = force_reg (GET_MODE (operands[1]), operands[1]); > + operands[1] = gen_lowpart (QImode, operands[1]); > +}) > > (define_insn_and_split "*<btsc><mode>_mask_1" > [(set (match_operand:SWI48 0 "register_operand") > @@ -14041,14 +14116,17 @@ (define_insn_and_split "*btr<mode>_mask" > (rotate:SWI48 > (const_int -2) > (subreg:QI > - (and:SI > - (match_operand:SI 1 "register_operand") > - (match_operand:SI 2 "const_int_operand")) 0)) > + (and > + (match_operand 1 "register_operand") > + (match_operand 2 "const_int_operand")) 0)) > (match_operand:SWI48 3 "register_operand"))) > (clobber (reg:CC FLAGS_REG))] > "TARGET_USE_BT > && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) > == GET_MODE_BITSIZE (<MODE>mode)-1 > + && GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[1])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > @@ -14059,7 +14137,10 @@ (define_insn_and_split "*btr<mode>_mask" > (match_dup 1)) > (match_dup 3))) > (clobber (reg:CC FLAGS_REG))])] > - "operands[1] = gen_lowpart (QImode, operands[1]);") > +{ > + operands[1] = force_reg (GET_MODE (operands[1]), operands[1]); > + operands[1] = gen_lowpart (QImode, operands[1]); > +}) > > (define_insn_and_split "*btr<mode>_mask_1" > [(set (match_operand:SWI48 0 "register_operand") > @@ -14409,6 +14490,47 @@ (define_insn_and_split "*jcc_bt<mode>_ma > operands[0] = shallow_copy_rtx (operands[0]); > PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0]))); > }) > + > +(define_insn_and_split "*jcc_bt<mode>_mask_1" > + [(set (pc) > + (if_then_else (match_operator 0 "bt_comparison_operator" > + [(zero_extract:SWI48 > + (match_operand:SWI48 1 "register_operand") > + (const_int 1) > + (zero_extend:SI > + (subreg:QI > + (and > + (match_operand 2 "register_operand") > + (match_operand 3 "const_int_operand")) 0)))]) > + (label_ref (match_operand 4)) > + (pc))) > + (clobber (reg:CC FLAGS_REG))] > + "(TARGET_USE_BT || optimize_function_for_size_p (cfun)) > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) > + == GET_MODE_BITSIZE (<MODE>mode)-1 > + && GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT > + && IN_RANGE (GET_MODE_SIZE (GET_MODE (operands[2])), 2, > + 4 << (TARGET_64BIT ? 1 : 0)) > + && ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extract:SWI48 > + (match_dup 1) > + (const_int 1) > + (match_dup 2)) > + (const_int 0))) > + (set (pc) > + (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)]) > + (label_ref (match_dup 4)) > + (pc)))] > +{ > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (SImode, operands[2]); > + operands[0] = shallow_copy_rtx (operands[0]); > + PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0]))); > +}) > > ;; Help combine recognize bt followed by cmov > (define_split > --- gcc/config/i386/i386.cc.jj 2022-05-31 11:33:51.452251660 +0200 > +++ gcc/config/i386/i386.cc 2022-06-01 12:40:06.189186012 +0200 > @@ -20995,6 +20995,20 @@ ix86_rtx_costs (rtx x, machine_mode mode > *total += 1; > return false; > > + case ZERO_EXTRACT: > + if (XEXP (x, 1) == const1_rtx > + && GET_CODE (XEXP (x, 2)) == ZERO_EXTEND > + && GET_MODE (XEXP (x, 2)) == SImode > + && GET_MODE (XEXP (XEXP (x, 2), 0)) == QImode) > + { > + /* Ignore cost of zero extension and masking of last argument. */ > + *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed); > + *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed); > + *total += rtx_cost (XEXP (XEXP (x, 2), 0), mode, code, 2, speed); > + return true; > + } > + return false; > + > default: > return false; > } > --- gcc/testsuite/gcc.target/i386/pr105778.c.jj 2022-05-31 13:59:12.470814609 > +0200 > +++ gcc/testsuite/gcc.target/i386/pr105778.c 2022-05-31 13:58:50.624044700 > +0200 > @@ -0,0 +1,45 @@ > +/* PR target/105778 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-not "\tand\[^\n\r]*\(31\|63\|127\|255\)" } } > */ > + > +unsigned int f1 (unsigned int x, unsigned long y) { y &= 31; return x << y; } > +unsigned int f2 (unsigned int x, unsigned long y) { return x << (y & 31); } > +unsigned int f3 (unsigned int x, unsigned long y) { y &= 31; return x >> y; } > +unsigned int f4 (unsigned int x, unsigned long y) { return x >> (y & 31); } > +int f5 (int x, unsigned long y) { y &= 31; return x >> y; } > +int f6 (int x, unsigned long y) { return x >> (y & 31); } > +unsigned long long f7 (unsigned long long x, unsigned long y) { y &= 63; > return x << y; } > +unsigned long long f8 (unsigned long long x, unsigned long y) { return x << > (y & 63); } > +unsigned long long f9 (unsigned long long x, unsigned long y) { y &= 63; > return x >> y; } > +unsigned long long f10 (unsigned long long x, unsigned long y) { return x >> > (y & 63); } > +long long f11 (long long x, unsigned long y) { y &= 63; return x >> y; } > +long long f12 (long long x, unsigned long y) { return x >> (y & 63); } > +#ifdef __SIZEOF_INT128__ > +unsigned __int128 f13 (unsigned __int128 x, unsigned long y) { y &= 127; > return x << y; } > +unsigned __int128 f14 (unsigned __int128 x, unsigned long y) { return x << > (y & 127); } > +unsigned __int128 f15 (unsigned __int128 x, unsigned long y) { y &= 127; > return x >> y; } > +unsigned __int128 f16 (unsigned __int128 x, unsigned long y) { return x >> > (y & 127); } > +__int128 f17 (__int128 x, unsigned long y) { y &= 127; return x >> y; } > +__int128 f18 (__int128 x, unsigned long y) { return x >> (y & 127); } > +#endif > +unsigned int f19 (unsigned int x, unsigned long y) { y &= 63; return x << y; > } > +unsigned int f20 (unsigned int x, unsigned long y) { return x << (y & 63); } > +unsigned int f21 (unsigned int x, unsigned long y) { y &= 63; return x >> y; > } > +unsigned int f22 (unsigned int x, unsigned long y) { return x >> (y & 63); } > +int f23 (int x, unsigned long y) { y &= 63; return x >> y; } > +int f24 (int x, unsigned long y) { return x >> (y & 63); } > +unsigned long long f25 (unsigned long long x, unsigned long y) { y &= 127; > return x << y; } > +unsigned long long f26 (unsigned long long x, unsigned long y) { return x << > (y & 127); } > +unsigned long long f27 (unsigned long long x, unsigned long y) { y &= 127; > return x >> y; } > +unsigned long long f28 (unsigned long long x, unsigned long y) { return x >> > (y & 127); } > +long long f29 (long long x, unsigned long y) { y &= 127; return x >> y; } > +long long f30 (long long x, unsigned long y) { return x >> (y & 127); } > +#ifdef __SIZEOF_INT128__ > +unsigned __int128 f31 (unsigned __int128 x, unsigned long y) { y &= 255; > return x << y; } > +unsigned __int128 f32 (unsigned __int128 x, unsigned long y) { return x << > (y & 255); } > +unsigned __int128 f33 (unsigned __int128 x, unsigned long y) { y &= 255; > return x >> y; } > +unsigned __int128 f34 (unsigned __int128 x, unsigned long y) { return x >> > (y & 255); } > +__int128 f35 (__int128 x, unsigned long y) { y &= 255; return x >> y; } > +__int128 f36 (__int128 x, unsigned long y) { return x >> (y & 255); } > +#endif > > Jakub >