On Tue, Mar 25, 2025 at 7:55 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> Hi!
>
> The following patch is miscompiled from r15-8478 but latently already
> since my r11-5756 and r11-6631 changes.
> The r11-5756 change was
> https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561164.html
> which changed the splitters to immediately throw away the masking.
> And the r11-6631 change was an optimization to recognize
> (set (zero_extract:HI (...) (const_int 1) (...)) (const_int 1)
> as btr.
>
> The problem is their interaction.  x86 is not a SHIFT_COUNT_TRUNCATED
> target, so the masking needs to be explicit in the IL.
> And combine.cc (make_field_assignment) has since 1992 optimizations
> which try to optimize x &= (-2 r<< y) into zero_extract (x) = 0.
> Now, such an optimization is fine if y has not been masked or if the
> chosen zero_extract has the same mode as the rotate (or it recognizes
> something with a left shift too).  IMHO such optimization is invalid
> for SHIFT_COUNT_TRUNCATED targets because we explicitly say that
> the masking of the shift/rotate counts are redundant there and don't
> need to be part of the IL (I have a patch for that, but because it
> is just latent, I'm not sure it needs to be posted for gcc 15 (and
> also am not sure if it should punt or add operand masking just in case)).
> x86 is not SHIFT_COUNT_TRUNCATED though and so even fixing combine
> not to do that for SHIFT_COUNT_TRUNCATED targets doesn't help, and we don't
> have QImode insv, so it is optimized into HImode insertions.  Now,
> if the y in x &= (-2 r<< y) wasn't masked in any way, turning it into
> HImode btr is just fine, but if it was x &= (-2 r<< (y & 7)) and we just
> decided to throw away the masking, using btr changes the behavior on it
> and causes e2fsprogs and sqlite miscompilations.
>
> So IMHO on !SHIFT_COUNT_TRUNCATED targets, we need to keep the maskings
> explicit in the IL, either at least for the duration of the combine pass
> as does the following patch (where combine is the only known pass to have
> such transformation), or even keep it until final pass in case there are
> some later optimizations that would also need to know whether there was
> explicit masking or not and with what mask.  The latter change would be
> much larger.
>
> The following patch just reverts the r11-5756 change and adds a testcase.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2025-03-25  Jakub Jelinek  <ja...@redhat.com>
>
>         PR target/96226
>         PR target/119428
>         * config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask,
>         splitter after *<rotate_insn><mode>3_mask_1): Revert 2020-12-05
>         changes.
>
>         * gcc.c-torture/execute/pr119428.c: New test.

OK.

BTW: According to [1], the revert is at author's discretion, no
outside approval is needed.

[1] https://gcc.gnu.org/gitwrite.html#all

Thanks,
Uros.

>
> --- gcc/config/i386/i386.md.jj  2025-03-24 11:29:12.271793423 +0100
> +++ gcc/config/i386/i386.md     2025-03-24 11:32:14.139305881 +0100
> @@ -18168,7 +18168,8 @@ (define_split
>   [(set (match_dup 4) (match_dup 1))
>    (set (match_dup 0)
>         (any_rotate:SWI (match_dup 4)
> -                      (subreg:QI (match_dup 2) 0)))]
> +                      (subreg:QI
> +                        (and:SI (match_dup 2) (match_dup 3)) 0)))]
>   "operands[4] = gen_reg_rtx (<MODE>mode);")
>
>  (define_insn_and_split "*<insn><mode>3_mask_1"
> @@ -18202,7 +18203,8 @@ (define_split
>    == GET_MODE_BITSIZE (<MODE>mode) - 1"
>   [(set (match_dup 4) (match_dup 1))
>    (set (match_dup 0)
> -       (any_rotate:SWI (match_dup 4) (match_dup 2)))]
> +       (any_rotate:SWI (match_dup 4)
> +                      (and:QI (match_dup 2) (match_dup 3))))]
>   "operands[4] = gen_reg_rtx (<MODE>mode);")
>
>  (define_insn_and_split "*<insn><mode>3_add"
> --- gcc/testsuite/gcc.c-torture/execute/pr119428.c.jj   2025-03-24 
> 11:41:31.583658619 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr119428.c      2025-03-24 
> 11:40:37.884395211 +0100
> @@ -0,0 +1,18 @@
> +/* PR target/119428 */
> +
> +__attribute__((noipa)) void
> +foo (unsigned int x, unsigned char *y)
> +{
> +  y += x >> 3;
> +  *y &= (unsigned char) ~(1 << (x & 0x07));
> +}
> +
> +int
> +main ()
> +{
> +  unsigned char buf[8];
> +  __builtin_memset (buf, 0xff, 8);
> +  foo (8, buf);
> +  if (buf[1] != 0xfe)
> +    __builtin_abort ();
> +}
>
>         Jakub
>

Reply via email to