Hi Richard, thanks for the response.

On Mon, May 26, 2025 at 11:55 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Mon, 26 May 2025, Konstantinos Eleftheriou wrote:
>
> > In `store_bit_field_1`, when the value to be written in the bitfield
> > and/or the bitfield itself have vector modes, non-canonical subregs
> > are generated, like `(subreg:V4SI (reg:V8SI x) 0)`. If one them is
> > a scalar, this happens only when the scalar mode is different than the
> > vector's inner mode.
> >
> > This patch tries to prevent this, using vec_set patterns when
> > possible.
>
> I know almost nothing about this code, but why does the patch
> fixup things after the fact rather than avoid generating the
> SUBREG in the first place?

That's what we are doing, we are trying to prevent the non-canonical
subreg generation (it's not always possible). But, there are cases
where these types of subregs are passed into `store_bit_field` by its
caller, in which case we choose not to touch them.

> ISTR it also (unfortunately) depends on the target which forms
> are considered canonical.

But, the way that we interpret the documentation, the
canonicalizations are machine-independent. Is that not true? Or,
specifically for the subregs that operate on vectors, is there any
target that considers them canonical?

> I'm also not sure you got endianess right for all possible
> values of SUBREG_BYTE.  One more reason to not generate such
> subreg in the first place but stick to vec_select/concat.

The only way that we would generate subregs are from the calls to
`extract_bit_field` or `store_bit_field_1` and these should handle the
endianness. Also, these subregs wouldn't operate on vectors. Do you
mean that something could go wrong with these calls?

Konstantinos


> Richard.
>
> > Bootstrapped/regtested on AArch64 and x86_64.
> >
> >       PR rtl-optimization/118873
> >
> > gcc/ChangeLog:
> >
> >       * expmed.cc (generate_vec_concat): New function.
> >       (store_bit_field_1): Check for cases where the value
> >       to be written and/or the bitfield have vector modes
> >       and try to generate the corresponding vec_set patterns
> >       instead of subregs.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/pr118873.c: New test.
> > ---
> >  gcc/expmed.cc                            | 174 ++++++++++++++++++++++-
> >  gcc/testsuite/gcc.target/i386/pr118873.c |  33 +++++
> >  2 files changed, 200 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr118873.c
> >
> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > index 8cf10d9c73bf..8c641f55b9c6 100644
> > --- a/gcc/expmed.cc
> > +++ b/gcc/expmed.cc
> > @@ -740,6 +740,42 @@ store_bit_field_using_insv (const extraction_insn 
> > *insv, rtx op0,
> >    return false;
> >  }
> >
> > +/* Helper function for store_bit_field_1, used in the case that the 
> > bitfield
> > +   and the destination are both vectors.  It extracts the elements of OP 
> > from
> > +   LOWER_BOUND to UPPER_BOUND using a vec_select and uses a vec_concat to
> > +   concatenate the extracted elements with the VALUE.  */
> > +
> > +rtx
> > +generate_vec_concat (machine_mode fieldmode, rtx op, rtx value,
> > +                  HOST_WIDE_INT lower_bound,
> > +                  HOST_WIDE_INT upper_bound)
> > +{
> > +  if (!VECTOR_MODE_P (fieldmode))
> > +    return NULL_RTX;
> > +
> > +  rtvec vec = rtvec_alloc (GET_MODE_NUNITS (fieldmode).to_constant ());
> > +  machine_mode outermode = GET_MODE (op);
> > +
> > +  for (HOST_WIDE_INT i = lower_bound; i < upper_bound; ++i)
> > +    RTVEC_ELT (vec, i) = GEN_INT (i);
> > +  rtx par = gen_rtx_PARALLEL (VOIDmode, vec);
> > +  rtx select = gen_rtx_VEC_SELECT (fieldmode, op, par);
> > +  if (BYTES_BIG_ENDIAN)
> > +    {
> > +      if (lower_bound > 0)
> > +     return gen_rtx_VEC_CONCAT (outermode, select, value);
> > +      else
> > +     return gen_rtx_VEC_CONCAT (outermode, value, select);
> > +    }
> > +  else
> > +    {
> > +      if (lower_bound > 0)
> > +     return gen_rtx_VEC_CONCAT (outermode, value, select);
> > +      else
> > +     return gen_rtx_VEC_CONCAT (outermode, select, value);
> > +    }
> > +}
> > +
> >  /* A subroutine of store_bit_field, with the same arguments.  Return true
> >     if the operation could be implemented.
> >
> > @@ -778,18 +814,142 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> > poly_uint64 bitnum,
> >    if (VECTOR_MODE_P (outermode)
> >        && !MEM_P (op0)
> >        && optab_handler (vec_set_optab, outermode) != CODE_FOR_nothing
> > -      && fieldmode == innermode
> > -      && known_eq (bitsize, GET_MODE_PRECISION (innermode))
> >        && multiple_p (bitnum, GET_MODE_PRECISION (innermode), &pos))
> >      {
> > +      /* Cases where the destination's inner mode is not equal to the
> > +      value's mode need special treatment.  */
> > +
> >        class expand_operand ops[3];
> >        enum insn_code icode = optab_handler (vec_set_optab, outermode);
> >
> > -      create_fixed_operand (&ops[0], op0);
> > -      create_input_operand (&ops[1], value, innermode);
> > -      create_integer_operand (&ops[2], pos);
> > -      if (maybe_expand_insn (icode, 3, ops))
> > -     return true;
> > +      /* Subreg expressions should operate on scalars only.  Subregs on
> > +      vectors are not canonical.  Extractions from vectors should use
> > +      vector operations instead.  */
> > +      bool is_non_canon_subreg = GET_CODE (value) == SUBREG
> > +                              && VECTOR_MODE_P (fieldmode)
> > +                              && !VECTOR_MODE_P (
> > +                                 GET_MODE (SUBREG_REG (value)));
> > +
> > +      /* If the value to be written is a memory expression or a 
> > non-canonical
> > +      scalar to vector subreg, don't try to generate a vec_set pattern.
> > +      Instead, fall back and try to generate an instruction without
> > +      touching the operands.  */
> > +      if (!MEM_P (value) && !is_non_canon_subreg)
> > +      {
> > +     if (VECTOR_MODE_P (fieldmode))
> > +       {
> > +         /* Handle the case where both the value to be written and the
> > +            destination are vectors.  */
> > +
> > +         HOST_WIDE_INT op_elem_num
> > +           = GET_MODE_NUNITS (outermode).to_constant ();
> > +         rtx concat_rtx = value;
> > +         rtx_insn *last_insn = get_last_insn ();
> > +         HOST_WIDE_INT index = 0;
> > +         /* If the store position is not at the start of the bitfield,
> > +            store the value by selecting the first pos elements of the
> > +            vector and then placing the value after them, using
> > +            a vec_concat.  */
> > +         if (pos.to_constant () > 0)
> > +           {
> > +             concat_rtx = generate_vec_concat (fieldmode, op0, value, 0,
> > +                                               pos.to_constant ());
> > +
> > +             index = pos.to_constant () +  bitsize.to_constant ()
> > +                     / GET_MODE_UNIT_BITSIZE (outermode);
> > +           }
> > +
> > +         /* Reconstruct the rest of the vector, after the value.  */
> > +         if (index < op_elem_num)
> > +           concat_rtx = generate_vec_concat (fieldmode, op0, concat_rtx,
> > +                                             index, op_elem_num);
> > +
> > +         rtx_insn *set_insn = emit_insn (gen_rtx_SET (op0, concat_rtx));
> > +
> > +         if (recog_memoized (set_insn) >= 0)
> > +           return true;
> > +         else
> > +           delete_insns_since (last_insn);
> > +       }
> > +     else if (fieldmode != innermode)
> > +       {
> > +         /* Handle the case where the destination is a vector and
> > +            the value's mode is different than the vector's inner
> > +            mode.  We have to treat the bitfield insertion differently
> > +            depending on which of those modes is wider than the other.  */
> > +
> > +         if (known_gt (GET_MODE_SIZE (fieldmode),
> > +                       GET_MODE_SIZE (innermode)))
> > +           {
> > +             /* If the value's mode is wider than the vector's inner
> > +                mode, extract a part from the value with size equal
> > +                to the vector's inner mode size and write it in the
> > +                appropriate position inside the vector, using a vec_set
> > +                pattern.  Repeat, until the whole value is written.  */
> > +
> > +             unsigned int curr_pos = 0;
> > +             bool failed = false;
> > +             rtx_insn *last_insn = get_last_insn ();
> > +             while (curr_pos < bitsize.to_constant ())
> > +               {
> > +
> > +                 rtx subreg = gen_reg_rtx (innermode);
> > +                 unsigned int innermode_size = GET_MODE_BITSIZE 
> > (innermode);
> > +                 rtx bitfield
> > +                   = extract_bit_field (value, innermode_size,
> > +                                        curr_pos, 0, NULL_RTX, innermode,
> > +                                        innermode, false, 0);
> > +
> > +                 store_bit_field_1 (subreg, innermode_size, 0, 0,
> > +                                    0, innermode, bitfield, false, false,
> > +                                    false);
> > +
> > +                 HOST_WIDE_INT index
> > +                     = pos.to_constant () + curr_pos / innermode_size;
> > +                 create_fixed_operand (&ops[0], op0);
> > +                 create_input_operand (&ops[1], subreg, innermode);
> > +                 create_integer_operand (&ops[2], index);
> > +                 if (!maybe_expand_insn (icode, 3, ops))
> > +                   {
> > +                     failed = true;
> > +                     break;
> > +                   }
> > +
> > +                 curr_pos += innermode_size;
> > +               }
> > +
> > +             if (!failed)
> > +               return true;
> > +             else
> > +               delete_insns_since (last_insn);
> > +           }
> > +         else if (known_lt (GET_MODE_SIZE (fieldmode),
> > +                            GET_MODE_SIZE (innermode)))
> > +           {
> > +             /* If the value's mode is narrower than the vector's inner
> > +                mode, extend the value's mode to the vector's inner
> > +                mode and use a vec_set pattern for the insertion.  */
> > +
> > +             rtx ext_value = gen_rtx_ZERO_EXTEND (innermode, value);
> > +
> > +             create_fixed_operand (&ops[0], op0);
> > +             create_input_operand (&ops[1], ext_value, innermode);
> > +             create_integer_operand (&ops[2], pos);
> > +             if (maybe_expand_insn (icode, 3, ops))
> > +               return true;
> > +           }
> > +       }
> > +     }
> > +
> > +      if (fieldmode == innermode
> > +       && known_eq (bitsize, GET_MODE_PRECISION (innermode)))
> > +     {
> > +       create_fixed_operand (&ops[0], op0);
> > +       create_input_operand (&ops[1], value, innermode);
> > +       create_integer_operand (&ops[2], pos);
> > +       if (maybe_expand_insn (icode, 3, ops))
> > +         return true;
> > +     }
> >      }
> >
> >    /* If the target is a register, overwriting the entire object, or storing
> > diff --git a/gcc/testsuite/gcc.target/i386/pr118873.c 
> > b/gcc/testsuite/gcc.target/i386/pr118873.c
> > new file mode 100644
> > index 000000000000..3a07c7cc87f9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr118873.c
> > @@ -0,0 +1,33 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2 -favoid-store-forwarding" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +typedef int v4si __attribute__((vector_size(16)));
> > +typedef int v8si __attribute__((vector_size(32)));
> > +
> > +v8si a;
> > +v4si b;
> > +
> > +/*
> > +** foo:
> > +**   ...
> > +**   vmovdqa a\(%rip\), %ymm0
> > +**   vmovdqa b\(%rip\), %xmm1
> > +**   vmovdqa %ymm0, \(%rdi\)
> > +**   vmovdqa 16\(%rdi\), %ymm0
> > +**   vmovdqa %xmm1, 32\(%rdi\)
> > +**   vinserti128     \$0x1, %xmm1, %ymm0, %ymm0
> > +**   vmovdqa %ymm0, a\(%rip\)
> > +**   vzeroupper
> > +**   ret
> > +**   ...
> > +*/
> > +void foo (int *p)
> > +{
> > +  v8si aa = a;
> > +  v4si bb = b;
> > +  *(v8si *)p = a;
> > +  *(v4si *)(p + 8) = b;
> > +  a = *(v8si *)(p + 4);
> > +}
> > +
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to