On Mon, 11 Apr 2022 at 15:56, Alex Bennée <[email protected]> wrote:
>
> When change b7711471f5 was made to alias XMMReg to ZMMReg for the
> purposes of easing the handling of AVX512 registers we unwittingly
> broke the SSE helpers which construct a temporary value on the stack
> before copying them out. To avoid this lets encode REG_WIDTH based on
> shift and convert the pointer indirection with an explicit memcpy.
>
> An incomplete sampling of the affected instructions seems to indicate
> the default behaviour for legacy SSE is "the upper bits (MAXVL-1:128)
> of the corresponding YMM register destination are unmodified."
>
> Fixes: b7711471f5 ("target-i386: make xmm_regs 512-bit wide")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/420
> Suggested-by: Peter Maydell <[email protected]>
> Signed-off-by: Alex Bennée <[email protected]>
> ---
>  target/i386/ops_sse.h | 71 ++++++++++++++++++++++++-------------------
>  1 file changed, 40 insertions(+), 31 deletions(-)
>
> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
> index 6f1fc174b3..adfb498a71 100644
> --- a/target/i386/ops_sse.h
> +++ b/target/i386/ops_sse.h
> @@ -28,6 +28,7 @@
>  #define L(n) MMX_L(n)
>  #define Q(n) MMX_Q(n)
>  #define SUFFIX _mmx
> +#define REG_WIDTH 8
>  #else
>  #define Reg ZMMReg
>  #define XMM_ONLY(...) __VA_ARGS__
> @@ -36,6 +37,7 @@
>  #define L(n) ZMM_L(n)
>  #define Q(n) ZMM_Q(n)
>  #define SUFFIX _xmm
> +#define REG_WIDTH 16
>  #endif
>
>  void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
> @@ -516,7 +518,7 @@ void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int 
> order)
>      r.W(1) = s->W((order >> 2) & 3);
>      r.W(2) = s->W((order >> 4) & 3);
>      r.W(3) = s->W((order >> 6) & 3);
> -    *d = r;
> +    memcpy(d, &r, REG_WIDTH);
>  }

Looking a bit more closely, this won't work on big-endian
hosts, because there we want to copy across the last 16
bytes of the struct, not the first 16. So I think we need
some more macro magic:

/*
 * Copy the relevant parts of a Reg value around. For the
 * SHIFT == 1 case these helpers operate only on the lower
 * 16 bytes of a 64 byte ZMMReg, so we must copy only those
 * so the guest-visible destination register has the top
 * bytes left untouched. For the SHIFT == 0 case we are
 * working with an MMXReg struct which is the correct size.
 * Note that we can't memcpy() here because that will do
 * the wrong thing on big-endian hosts.
 */
#if SHIFT == 0
#define COPY_REG(DEST, SRC) (DEST) = (SRC)
#else
#define COPY_REG(DEST, SRC) do { \
    (DEST).Q(0) = (SRC).Q(0);    \
    (DEST).Q(1) = (SRC).Q(1);    \
  } while (0)
#endif

and then use COPY_REG(*d, r);

(adjust syntax to taste, not compile tested).

We could probably try to write endian-specific flavours of
memcpy() invocation, but "do two 64-bit word copies" is what
the compiler would hopefully turn the memcpy into anyway :-)

thanks
-- PMM

Reply via email to