Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

Richard Biener via Gcc-patches Wed, 19 Jul 2023 02:23:22 -0700

On Wed, 19 Jul 2023, YunQiang Su wrote:

> Richard Biener <[email protected]> ?2023?7?19??? 15:22???
> >
> > On Wed, 19 Jul 2023, YunQiang Su wrote:
> >
> > > Richard Biener via Gcc-patches <[email protected]> ?2023?7?19??? 
> > > 14:27???
> > > >
> > > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > > >
> > > > > PR #104914
> > > > >
> > > > > When work with
> > > > >   int val;
> > > > >   ((unsigned char*)&val)[3] = *buf;
> > > > >   if (val > 0) ...
> > > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > > D<INS> is used instead of <INS>.  Thus something wrong happens
> > > > > on sign-extend default architectures, like MIPS64.
> > > > >
> > > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > > store_integral_bit_field if:
> > > > >   modes of op0 and str_rtx are INT;
> > > > >   length of op0 is greater than str_rtx.
> > > > >
> > > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > > > mips64el-linux-gnuabi64 without regression.
> > > >
> > > > I still think you are "fixing" this in the wrong place.  The bugzilla
> > > > audit trail points to combine and later notes an eventual expansion
> > > > issue (but for another testcase/target).
> > > >
> > > > You have to explain in more detail on what is wrong with the initial
> > > > RTL on mips.
> > > >
> > >
> > > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is like
> > >
> > > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > >             (const_int 8 [0x8])
> > >             (const_int 0 [0]))
> > >         (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> > >      (nil))
> > >
> > > Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> > > instructions.
> > > While in fact here, we expect an SImode operation, due to `val` in C
> > > code is `int`.
> > >
> > > With my patch, the RTX will be like:
> > >
> > > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
> > >             (const_int 8 [0x8])
> > >             (const_int 0 [0]))
> > >         (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> > >      (nil))
> >
> > But if this RTL is correct then the above with DImode is correct as
> > well and the issue is in the backend definition of the instruction
> > defining 'DINS'?
> >
> 
> I don't think so.
> 
> (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
>                                                      ^^
>              (const_int 8 [0x8])
>              (const_int 0 [0]))
>          (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
>       (nil))
> 
> This RTL has only info about DI. It doesn't has any info about the
> real length of
> `val`. For backend, it has no other choice instead of `DINS`.
> 
> > > So the operation will be SImode, aka `INS` instruction for MIPS64.
> > >
> > > The problem is based on 2 fact/root cause:
> > > 1. MIPS's `INS` instruction will be always to sign-extension, while 
> > > `DINS` won't
> > >     li $7, 0xff
> > >     li $8, 0
> > >     ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > The value of $8 will be 0xff ff ff ff ff 00 00 00.
> >
> > Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> > bits outside of the indicated range.
> >
> 
> In fact, it is how sign-extension arch work.
> No matter wrong or right, the ISA was/is defined like this.
> 
> In fact, one MIPS 32 ABI, the same C code will generate the RTL like this,
> and the 32bit object can still workable on 64bit CPU.
> That's a smart (or brain-damaged) design.
> 
> > @findex zero_extract
> > @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> > Like @code{sign_extract} but refers to an unsigned or zero-extended
> > bit-field.  The same sequence of bits are extracted, but they
> > are filled to an entire word with zeros instead of by sign-extension.
> >
> 
> That's depending on the definition of `word` here.
> For `(zero_extract:SI`, I think that the word is limit to the low 32bit of
> hardware register.
> Anyway, it won't break ISA without sign-extension by default.
> 
> Due to the nature of sign-extension ISA, if we don't sign-extension the
> `int` variable, it will make something wrong.
> 
> To make it clear: the word `sign extension` here means:
>        the the value of 31bit will be copied to bits [32-63], and
>        the value of bits[0-30] won't be copied.
> Here is the examples:
>     li $7, 0xff
>     li $8, 0x00 00 ff 00
>     ins $8,$7,16,8
>                     ^^
> The value of $8 will be: 0x 00 00 00 00 00 ff ff 00
> 
>     li $7, 0xff
>     li $8, 0x00 00 ff 00
>     ins $8,$7,24,8
>                     ^^
> The value of $8 will be: 0x ff ff ff ff ff 00 ff 00


But that's INS.

> > Unlike @code{sign_extract}, this type of expressions can be lvalues
> > in RTL; they may appear on the left side of an assignment, indicating
> > insertion of a value into the specified bit-field.
> > @end table

Note ^^^ applies for (zero_extract ..) as a SET destination.  The
issue is probably that MIPS is WORD_REGISTER_OPERATIONS which has
proved "interesting" (and under-documented).  I suppose word_mode
is DImode which is likely why the bitfield operations work on it.

I see theres DINS, DINSM and DINSU in some mips ISA document
I found on the internet but I didn't see where that would
cause sign or zero extension off bit 31.

> >
> > >     li $7, 0xff
> > >     li $8, 0
> > >     dins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > The value of $8 will be 0x 00 00 00 00 ff 00 00 00.
> >
> > which isn't correct either.
> >
>
> It is not correct or not-correct: The ISA manual just state like this,
> and the hardwares are working like this.

I don't see that.  That's definitely not what GCC expects here,
the left-most word of the doubleword should be unchanged.

Your testcase should be a dg-do-run and probably more like

NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
{
  int val;
  ((unsigned char*)&val)[0] = *buf++;
  ((unsigned char*)&val)[1] = *buf++;
  ((unsigned char*)&val)[2] = *buf++;
  ((unsigned char*)&val)[3] = *buf++;
  return val;
}
int main()
{
  int val = 0x01020304;
  val = test (&val);
  if (val != 0x01020304)
    abort ();
}

not sure if I got endianess correct.  Now, the question is what
WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
the MIPS ABI says for returning SImode.

Others might be able to answer this.

> > If you look a few dumps further you'll see which instruction was
> > recognized, I suspect the machine description is simply wrong here?
> >
> 
> The design of initial RTL may has expect that the backend may expand
> 
> (insn 14 13 15 2 (set (reg/v:DI 201 [ val ])
>         (sign_extend:DI (subreg:SI (reg/v:DI 201 [ val ]) 0))) "xx.c":5:29 -1
>      (nil))
> 
> to an `SLL` instruction, which can fix what `DINS` do, aka
>      0x 00 00 00 00 ff 00 00 00 ---> 0x ff ff ff ff ff 00 00 00
> 
> I guess this is what you mean about the mistake of machine description.
> While MIPS md believes that it's sign-extension by default, so it is
> not needed at all.
> 
> > > 2. Due to most of MIPS instructions work with 32bit value, aka 
> > > instructions
> > > without `d` as its first char (in fact with few exception), are 
> > > sign-extension,
> > > the MIPS backend just ignore `extendsidi2`, aka RTX
> > >
> > > (insn 14 13 15 2 (set (reg/v:DI 200 [ val ])
> > >         (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) 
> > > "xx.c":5:29 -1
> > >      (nil))
> > >
> > >
> 
> This is just background info about MIPS:
> 
> On a MIPS32 hardware, the value -1 is  0x ff ff ff ff, which is same
> with other arch.
> On a MIPS64 hardware, the value of (int32_t)-1 is
>      0x ff ff ff ff ff ff ff ff
> which is same with (int64_t)-1.
> So the single compare-and-branch instruction can work with both
> int32_t and int64_t.
> 
> On none sign-extension arch, like ARM64, (int32_t)-1 is
>    0x 00 00 00 00 ff ff ff ff
> and (int64_t)-1 is
>    0x ff ff ff ff ff ff ff ff
> That's why the `CMP` instructions for X and W have different encoding:
> the 31bit of the encoding: `sf` bit.
> 
> 
> ================================
> For this problem, we have 2 choice to fix:
> 1. This patch
> 2.

2. disallow INS affecting bit 31 of a double-word - there's
mips_use_ins_ext_p that could be used (maybe also pass in whether it's
an INS).

Richard.

> 
> > >
> > > > Richard.
> > > >
> > > > > gcc/ChangeLog:
> > > > >         PR: 104914.
> > > > >         * expmed.cc(store_bit_field_1): Pass str_rtx and its mode
> > > > >       to store_integral_bit_field if the length of op0 is greater
> > > > >       than str_rtx.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >         PR: 104914.
> > > > >       * gcc.target/mips/pr104914.c: New testcase.
> > > > > ---
> > > > >  gcc/expmed.cc                            | 20 +++++++++++++++++---
> > > > >  gcc/testsuite/gcc.target/mips/pr104914.c | 17 +++++++++++++++++
> > > > >  2 files changed, 34 insertions(+), 3 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c
> > > > >
> > > > > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > > > > index fbd4ce2d42f..5531c19e891 100644
> > > > > --- a/gcc/expmed.cc
> > > > > +++ b/gcc/expmed.cc
> > > > > @@ -850,6 +850,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 
> > > > > bitsize, poly_uint64 bitnum,
> > > > >       since that case is valid for any mode.  The following cases are 
> > > > > only
> > > > >       valid for integral modes.  */
> > > > >    opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
> > > > > +  opt_scalar_int_mode str_mode = int_mode_for_mode (GET_MODE 
> > > > > (str_rtx));
> > > > >    scalar_int_mode imode;
> > > > >    if (!op0_mode.exists (&imode) || imode != GET_MODE (op0))
> > > > >      {
> > > > > @@ -881,9 +882,22 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 
> > > > > bitsize, poly_uint64 bitnum,
> > > > >       op0 = gen_lowpart (op0_mode.require (), op0);
> > > > >      }
> > > > >
> > > > > -  return store_integral_bit_field (op0, op0_mode, ibitsize, ibitnum,
> > > > > -                                bitregion_start, bitregion_end,
> > > > > -                                fieldmode, value, reverse, 
> > > > > fallback_p);
> > > > > +  /* If MODEs of str_rtx and op0 are INT, and the length of op0 is 
> > > > > greater than
> > > > > +     str_rtx, it means that str_rtx has a shorter SUBREG: int32 on 
> > > > > 64 mach/ABI
> > > > > +     is an example.  For this case, we should use the mode of 
> > > > > SUBREG, otherwise
> > > > > +     bad code will generate for sign-extension ports, like MIPS.  */
> > > > > +  bool use_str_mode = false;
> > > > > +  if (GET_MODE_CLASS (GET_MODE (str_rtx)) == MODE_INT
> > > > > +      && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
> > > > > +      && known_gt (GET_MODE_SIZE (GET_MODE (op0)),
> > > > > +                GET_MODE_SIZE (GET_MODE (str_rtx))))
> > > > > +    use_str_mode = true;
> > > > > +
> > > > > +  return store_integral_bit_field (use_str_mode ? str_rtx : op0,
> > > > > +                                use_str_mode ? str_mode : op0_mode,
> > > > > +                                ibitsize, ibitnum, bitregion_start,
> > > > > +                                bitregion_end, fieldmode, value,
> > > > > +                                reverse, fallback_p);
> > > > >  }
> > > > >
> > > > >  /* Subroutine of store_bit_field_1, with the same arguments, except
> > > > > diff --git a/gcc/testsuite/gcc.target/mips/pr104914.c 
> > > > > b/gcc/testsuite/gcc.target/mips/pr104914.c
> > > > > new file mode 100644
> > > > > index 00000000000..fd6ef6af446
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/mips/pr104914.c
> > > > > @@ -0,0 +1,17 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-march=mips64r2 -mabi=64" } */
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "\tdins\t" } } */
> > > > > +
> > > > > +NOMIPS16 int test (const unsigned char *buf)
> > > > > +{
> > > > > +  int val;
> > > > > +  ((unsigned char*)&val)[0] = *buf++;
> > > > > +  ((unsigned char*)&val)[1] = *buf++;
> > > > > +  ((unsigned char*)&val)[2] = *buf++;
> > > > > +  ((unsigned char*)&val)[3] = *buf++;
> > > > > +  if(val > 0)
> > > > > +    return 1;
> > > > > +  else
> > > > > +    return 0;
> > > > > +}
> > > > >
> > > >
> > > > --
> > > > Richard Biener <[email protected]>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 
> > > > Nuernberg,
> > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> > > > HRB 36809 (AG Nuernberg)
> > >
> > >
> > >
> > >
> >
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> > HRB 36809 (AG Nuernberg)
> 
> 
> 
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

Reply via email to