On Tue, Mar 25, 2025 at 7:55 AM Jakub Jelinek <ja...@redhat.com> wrote: > > Hi! > > The following patch is miscompiled from r15-8478 but latently already > since my r11-5756 and r11-6631 changes. > The r11-5756 change was > https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561164.html > which changed the splitters to immediately throw away the masking. > And the r11-6631 change was an optimization to recognize > (set (zero_extract:HI (...) (const_int 1) (...)) (const_int 1) > as btr. > > The problem is their interaction. x86 is not a SHIFT_COUNT_TRUNCATED > target, so the masking needs to be explicit in the IL. > And combine.cc (make_field_assignment) has since 1992 optimizations > which try to optimize x &= (-2 r<< y) into zero_extract (x) = 0. > Now, such an optimization is fine if y has not been masked or if the > chosen zero_extract has the same mode as the rotate (or it recognizes > something with a left shift too). IMHO such optimization is invalid > for SHIFT_COUNT_TRUNCATED targets because we explicitly say that > the masking of the shift/rotate counts are redundant there and don't > need to be part of the IL (I have a patch for that, but because it > is just latent, I'm not sure it needs to be posted for gcc 15 (and > also am not sure if it should punt or add operand masking just in case)). > x86 is not SHIFT_COUNT_TRUNCATED though and so even fixing combine > not to do that for SHIFT_COUNT_TRUNCATED targets doesn't help, and we don't > have QImode insv, so it is optimized into HImode insertions. Now, > if the y in x &= (-2 r<< y) wasn't masked in any way, turning it into > HImode btr is just fine, but if it was x &= (-2 r<< (y & 7)) and we just > decided to throw away the masking, using btr changes the behavior on it > and causes e2fsprogs and sqlite miscompilations. > > So IMHO on !SHIFT_COUNT_TRUNCATED targets, we need to keep the maskings > explicit in the IL, either at least for the duration of the combine pass > as does the following patch (where combine is the only known pass to have > such transformation), or even keep it until final pass in case there are > some later optimizations that would also need to know whether there was > explicit masking or not and with what mask. The latter change would be > much larger. > > The following patch just reverts the r11-5756 change and adds a testcase. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2025-03-25 Jakub Jelinek <ja...@redhat.com> > > PR target/96226 > PR target/119428 > * config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask, > splitter after *<rotate_insn><mode>3_mask_1): Revert 2020-12-05 > changes. > > * gcc.c-torture/execute/pr119428.c: New test.
OK. BTW: According to [1], the revert is at author's discretion, no outside approval is needed. [1] https://gcc.gnu.org/gitwrite.html#all Thanks, Uros. > > --- gcc/config/i386/i386.md.jj 2025-03-24 11:29:12.271793423 +0100 > +++ gcc/config/i386/i386.md 2025-03-24 11:32:14.139305881 +0100 > @@ -18168,7 +18168,8 @@ (define_split > [(set (match_dup 4) (match_dup 1)) > (set (match_dup 0) > (any_rotate:SWI (match_dup 4) > - (subreg:QI (match_dup 2) 0)))] > + (subreg:QI > + (and:SI (match_dup 2) (match_dup 3)) 0)))] > "operands[4] = gen_reg_rtx (<MODE>mode);") > > (define_insn_and_split "*<insn><mode>3_mask_1" > @@ -18202,7 +18203,8 @@ (define_split > == GET_MODE_BITSIZE (<MODE>mode) - 1" > [(set (match_dup 4) (match_dup 1)) > (set (match_dup 0) > - (any_rotate:SWI (match_dup 4) (match_dup 2)))] > + (any_rotate:SWI (match_dup 4) > + (and:QI (match_dup 2) (match_dup 3))))] > "operands[4] = gen_reg_rtx (<MODE>mode);") > > (define_insn_and_split "*<insn><mode>3_add" > --- gcc/testsuite/gcc.c-torture/execute/pr119428.c.jj 2025-03-24 > 11:41:31.583658619 +0100 > +++ gcc/testsuite/gcc.c-torture/execute/pr119428.c 2025-03-24 > 11:40:37.884395211 +0100 > @@ -0,0 +1,18 @@ > +/* PR target/119428 */ > + > +__attribute__((noipa)) void > +foo (unsigned int x, unsigned char *y) > +{ > + y += x >> 3; > + *y &= (unsigned char) ~(1 << (x & 0x07)); > +} > + > +int > +main () > +{ > + unsigned char buf[8]; > + __builtin_memset (buf, 0xff, 8); > + foo (8, buf); > + if (buf[1] != 0xfe) > + __builtin_abort (); > +} > > Jakub >