On Tue, Feb 4, 2020 at 10:39 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> Hi!
>
> As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for
> registers that aren't ever live in the function before and break the
> prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they
> consider all [xyz]mm registers call clobbered, but the ms ABI considers
> xmm0-15 call used but the bits above low 128 ones call clobbered).
> The following patch fixes it by not adding the clobbers during vzeroupper
> pass (before pro_and_epilogue), but adding them for -fipa-ra purposes only
> during the final output.  Perhaps we could add some CLOBBERs early (say for
> df_regs_ever_live_p regs that aren't live in the live_regs bitmap, or
> depending on the ABI either add all of them immediately, or for ms ABI add
> CLOBBERs for xmm0-xmm5 if they don't have a SET) and add the rest later.
> And the addition could be perhaps done at other spots, e.g. in an
> epilogue_completed guarded splitter.

If it works OK, I'd rather see this functionality implemented as an
epilogue_completed guarded splitter. In the .md files, there are
already cases where we split at this point, and where it is assumed
that allocated registers won't change anymore. Also, please don't make
the functionality conditional on flag_split_ra. This way, we would
always get new patterns in the debug dumps, so in case something goes
wrong, one could at least clearly see the full pattern.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Or any of the above mentioned variants?
>
> A separate issue is CALL_USED_REGISTERS for msabi on xmm16-xmm31.

I guess that Comment #9 patch form the PR should be trivially correct,
but althouhg it looks obvious, I don't want to propose the patch since
I have no means of testing it.

Uros.

> And yet another issue is that the presence of a vzeroupper instruction does
> (and the patch doesn't change) prevent callers from trying to preserve
> something in the xmm0-xmm15 registers if the callee doesn't use them.
> Would be nice to be able to say that the vzeroupper will preserve the low
> 128 bits, so it is ok to hold a 128-bit value across the call, but not
> 256-bit or larger.  Not sure if the RA has a way to express that (and IPA-RA
> certainly ATM doesn't).
>
> 2020-02-04  Jakub Jelinek  <ja...@redhat.com>
>
>         PR target/92190
>         * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
>         include sets and not clobbers in the vzeroupper pattern.
>         * config/i386/sse.md (*avx_vzeroupper): Add the clobbers during
>         pattern output.
>
>         * gcc.target/i386/pr92190.c: New test.
>
> --- gcc/config/i386/i386-features.c.jj  2020-02-03 13:17:15.919723150 +0100
> +++ gcc/config/i386/i386-features.c     2020-02-03 18:30:08.274865983 +0100
> @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
>
>       (set (reg:V2DF R) (reg:V2DF R))
>
> -   which preserves the low 128 bits but clobbers the upper bits.
> -   For a dead register we just use:
> -
> -     (clobber (reg:V2DF R))
> -
> -   which invalidates any previous contents of R and stops R from becoming
> -   live across the vzeroupper in future.  */
> +   which preserves the low 128 bits but clobbers the upper bits.  */
>
>  static void
>  ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
>  {
>    rtx pattern = PATTERN (insn);
>    unsigned int nregs = TARGET_64BIT ? 16 : 8;
> -  rtvec vec = rtvec_alloc (nregs + 1);
> -  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  unsigned int npats = nregs;
>    for (unsigned int i = 0; i < nregs; ++i)
>      {
>        unsigned int regno = GET_SSE_REGNO (i);
> +      if (!bitmap_bit_p (live_regs, regno))
> +       npats--;
> +    }
> +  if (npats == 0)
> +    return;
> +  rtvec vec = rtvec_alloc (npats + 1);
> +  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> +  for (unsigned int i = 0, j = 0; i < nregs; ++i)
> +    {
> +      unsigned int regno = GET_SSE_REGNO (i);
> +      if (!bitmap_bit_p (live_regs, regno))
> +       continue;
>        rtx reg = gen_rtx_REG (V2DImode, regno);
> -      if (bitmap_bit_p (live_regs, regno))
> -       RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> -      else
> -       RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> +      ++j;
> +      RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
>      }
>    XVEC (pattern, 0) = vec;
>    df_insn_rescan (insn);
> --- gcc/config/i386/sse.md.jj   2020-02-03 13:17:15.957722589 +0100
> +++ gcc/config/i386/sse.md      2020-02-03 18:30:08.280865894 +0100
> @@ -19819,7 +19819,34 @@ (define_insn "*avx_vzeroupper"
>    [(match_parallel 0 "vzeroupper_pattern"
>       [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
>    "TARGET_AVX"
> -  "vzeroupper"
> +{
> +  if (flag_ipa_ra && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
> +    {
> +      /* For IPA-RA purposes, make it clear the instruction clobbers
> +        even XMM registers not mentioned explicitly in the pattern.  */
> +      unsigned int nregs = TARGET_64BIT ? 16 : 8;
> +      unsigned int npats = XVECLEN (operands[0], 0);
> +      rtvec vec = rtvec_alloc (nregs + 1);
> +      RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
> +      for (unsigned int i = 0, j = 1; i < nregs; ++i)
> +       {
> +         unsigned int regno = GET_SSE_REGNO (i);
> +         if (j < npats
> +             && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
> +           {
> +             RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
> +             j++;
> +           }
> +         else
> +           {
> +             rtx reg = gen_rtx_REG (V2DImode, regno);
> +             RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> +           }
> +       }
> +      XVEC (operands[0], 0) = vec;
> +    }
> +  return "vzeroupper";
> +}
>    [(set_attr "type" "sse")
>     (set_attr "modrm" "0")
>     (set_attr "memory" "none")
> --- gcc/testsuite/gcc.target/i386/pr92190.c.jj  2020-02-03 18:29:30.924415644 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92190.c     2020-02-03 18:29:15.992635389 
> +0100
> @@ -0,0 +1,19 @@
> +/* PR target/92190 */
> +/* { dg-do compile { target { *-*-linux* && lp64 } } } */
> +/* { dg-options "-mabi=ms -O2 -mavx512f" } */
> +
> +typedef char VC __attribute__((vector_size (16)));
> +typedef int VI __attribute__((vector_size (16 * sizeof 0)));
> +VC a;
> +VI b;
> +void bar (VI);
> +void baz (VC);
> +
> +void
> +foo (void)
> +{
> +  VC k = a;
> +  VI n = b;
> +  bar (n);
> +  baz (k);
> +}
>
>         Jakub
>

Reply via email to