On Wed, Oct 21, 2020 at 9:18 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <qing.z...@oracle.com> wrote:
>
> > +/* Check whether the register REGNO should be zeroed on X86.
> > +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> > +   together, no need to zero it again.
> > +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> > +   very hard to be zeroed individually, don't zero individual st or
> > +   mm registgers at this time.  */
> > +
> > +static bool
> > +zero_call_used_regno_p (const unsigned int regno,
> > + bool all_sse_zeroed)
> > +{
> > +  return GENERAL_REGNO_P (regno)
> > +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
> > +  || MASK_REGNO_P (regno);
> > +}
> > +
> > +/* Return the machine_mode that is used to zero register REGNO.  */
> > +
> > +static machine_mode
> > +zero_call_used_regno_mode (const unsigned int regno)
> > +{
> > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > +     and the lower 128 bits for vector registers since destination are
> > +     zero-extended to the full register width.  */
> > +  if (GENERAL_REGNO_P (regno))
> > +    return SImode;
> > +  else if (SSE_REGNO_P (regno))
> > +    return V4SFmode;
> > +  else
> > +    return HImode;
> > +}
> > +
> > +/* Generate a rtx to zero all vector registers togetehr if possible,
> > +   otherwise, return NULL.  */
> > +
> > +static rtx
> > +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> > +{
> > +  if (!TARGET_AVX)
> > +    return NULL;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > +  || (TARGET_64BIT
> > +      && (REX_SSE_REGNO_P (regno)
> > +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> > +      return NULL;
> > +
> > +  return gen_avx_vzeroall ();
> > +}
> > +
> > +/* Generate a rtx to zero all st and mm registers togetehr if possible,
> > +   otherwise, return NULL.  */
> > +
> > +static rtx
> > +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> > +{
> > +  if (!TARGET_MMX)
> > +    return NULL;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> > + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> > +      return NULL;
> > +
> > +  return gen_mmx_emms ();
> >
> >
> > emms is not clearing any register, it only loads x87FPUTagWord with
> > FFFFH. So I think, the above is useless, as far as register clearing
> > is concerned.
> >
> >
> > Thanks for the info.
> >
> > So, for mm and st registers, should we clear them, and how?
> >
> >
> > I don't know.
> >
> > Please note that %mm and %st share the same register file, and
> > touching %mm registers will block access to %st until emms is emitted.
> > You can't just blindly load 0 to %st registers, because the register
> > file can be in MMX mode and vice versa. For 32bit targets, function
> > can also  return a value in the %mm0.
> >
> >
> > If data flow determine that %mm0 does not return a value at the return, can 
> > we clear all the %st as following:
> >
> > emms
> > mov %st0, 0
> > mov %st1, 0
> > mov %st2, 0
> > mov %st3, 0
> > mov %st4, 0
> > mov %st5, 0
> > mov %st6, 0
> > mov %st7, 0
>
> The i386 ABI says:
>
> -- q --
> The CPU shall be in x87 mode upon entry to a function. Therefore,
> every function that uses the MMX registers is required to issue an
> emms or femms instruction after using MMX registers, before returning
> or calling another function.
> -- /q --
>
> (The above requirement slightly contradicts its own ABI, since we have
> 3 MMX argument registers and MMX return register, so the CPU obviously
> can't be in x87 mode at all function boundaries).
>
> So, assuming that the first sentence is not deliberately vague w.r.t
> function exit, emms should not be needed. However, we are dealing with
> x87 stack registers that have their own set of peculiarities. It is
> not possible to load a random register in the way you show.  Also,
> stack should be either empty or one (two in case of complex value
> return) levels deep at the function return. I think you want a series
> of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
> the stack and mark stack slots empty.

Something like this:

--cut here--
long double
__attribute__ ((noinline))
test (long double a, long double b)
{
  long double r = a + b;

  asm volatile ("fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0)" : : "X"(r));
  return r;
}

int
main ()
{
  long double a = 1.1, b = 1.2;

  long double c = test (a, b);

  printf ("%Lf\n", c);

  return 0;
}
--cut here--

Uros.

Reply via email to