On Wed, Oct 21, 2020 at 9:18 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <qing.z...@oracle.com> wrote: > > > +/* Check whether the register REGNO should be zeroed on X86. > > + When ALL_SSE_ZEROED is true, all SSE registers have been zeroed > > + together, no need to zero it again. > > + Stack registers (st0-st7) and mm0-mm7 are aliased with each other. > > + very hard to be zeroed individually, don't zero individual st or > > + mm registgers at this time. */ > > + > > +static bool > > +zero_call_used_regno_p (const unsigned int regno, > > + bool all_sse_zeroed) > > +{ > > + return GENERAL_REGNO_P (regno) > > + || (!all_sse_zeroed && SSE_REGNO_P (regno)) > > + || MASK_REGNO_P (regno); > > +} > > + > > +/* Return the machine_mode that is used to zero register REGNO. */ > > + > > +static machine_mode > > +zero_call_used_regno_mode (const unsigned int regno) > > +{ > > + /* NB: We only need to zero the lower 32 bits for integer registers > > + and the lower 128 bits for vector registers since destination are > > + zero-extended to the full register width. */ > > + if (GENERAL_REGNO_P (regno)) > > + return SImode; > > + else if (SSE_REGNO_P (regno)) > > + return V4SFmode; > > + else > > + return HImode; > > +} > > + > > +/* Generate a rtx to zero all vector registers togetehr if possible, > > + otherwise, return NULL. */ > > + > > +static rtx > > +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs) > > +{ > > + if (!TARGET_AVX) > > + return NULL; > > + > > + for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) > > + if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG) > > + || (TARGET_64BIT > > + && (REX_SSE_REGNO_P (regno) > > + || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno))))) > > + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)) > > + return NULL; > > + > > + return gen_avx_vzeroall (); > > +} > > + > > +/* Generate a rtx to zero all st and mm registers togetehr if possible, > > + otherwise, return NULL. */ > > + > > +static rtx > > +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs) > > +{ > > + if (!TARGET_MMX) > > + return NULL; > > + > > + for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) > > + if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno)) > > + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)) > > + return NULL; > > + > > + return gen_mmx_emms (); > > > > > > emms is not clearing any register, it only loads x87FPUTagWord with > > FFFFH. So I think, the above is useless, as far as register clearing > > is concerned. > > > > > > Thanks for the info. > > > > So, for mm and st registers, should we clear them, and how? > > > > > > I don't know. > > > > Please note that %mm and %st share the same register file, and > > touching %mm registers will block access to %st until emms is emitted. > > You can't just blindly load 0 to %st registers, because the register > > file can be in MMX mode and vice versa. For 32bit targets, function > > can also return a value in the %mm0. > > > > > > If data flow determine that %mm0 does not return a value at the return, can > > we clear all the %st as following: > > > > emms > > mov %st0, 0 > > mov %st1, 0 > > mov %st2, 0 > > mov %st3, 0 > > mov %st4, 0 > > mov %st5, 0 > > mov %st6, 0 > > mov %st7, 0 > > The i386 ABI says: > > -- q -- > The CPU shall be in x87 mode upon entry to a function. Therefore, > every function that uses the MMX registers is required to issue an > emms or femms instruction after using MMX registers, before returning > or calling another function. > -- /q -- > > (The above requirement slightly contradicts its own ABI, since we have > 3 MMX argument registers and MMX return register, so the CPU obviously > can't be in x87 mode at all function boundaries). > > So, assuming that the first sentence is not deliberately vague w.r.t > function exit, emms should not be needed. However, we are dealing with > x87 stack registers that have their own set of peculiarities. It is > not possible to load a random register in the way you show. Also, > stack should be either empty or one (two in case of complex value > return) levels deep at the function return. I think you want a series > of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear > the stack and mark stack slots empty.
Something like this: --cut here-- long double __attribute__ ((noinline)) test (long double a, long double b) { long double r = a + b; asm volatile ("fldz; \ fldz; \ fldz; \ fldz; \ fldz; \ fldz; \ fldz; \ fstp %%st(0); \ fstp %%st(0); \ fstp %%st(0); \ fstp %%st(0); \ fstp %%st(0); \ fstp %%st(0); \ fstp %%st(0)" : : "X"(r)); return r; } int main () { long double a = 1.1, b = 1.2; long double c = test (a, b); printf ("%Lf\n", c); return 0; } --cut here-- Uros.