On Mon, 11 Apr 2022 at 15:56, Alex Bennée <[email protected]> wrote:
>
> When change b7711471f5 was made to alias XMMReg to ZMMReg for the
> purposes of easing the handling of AVX512 registers we unwittingly
> broke the SSE helpers which construct a temporary value on the stack
> before copying them out. To avoid this lets encode REG_WIDTH based on
> shift and convert the pointer indirection with an explicit memcpy.
>
> An incomplete sampling of the affected instructions seems to indicate
> the default behaviour for legacy SSE is "the upper bits (MAXVL-1:128)
> of the corresponding YMM register destination are unmodified."
>
> Fixes: b7711471f5 ("target-i386: make xmm_regs 512-bit wide")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/420
> Suggested-by: Peter Maydell <[email protected]>
> Signed-off-by: Alex Bennée <[email protected]>
> ---
> target/i386/ops_sse.h | 71 ++++++++++++++++++++++++-------------------
> 1 file changed, 40 insertions(+), 31 deletions(-)
>
> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
> index 6f1fc174b3..adfb498a71 100644
> --- a/target/i386/ops_sse.h
> +++ b/target/i386/ops_sse.h
> @@ -28,6 +28,7 @@
> #define L(n) MMX_L(n)
> #define Q(n) MMX_Q(n)
> #define SUFFIX _mmx
> +#define REG_WIDTH 8
> #else
> #define Reg ZMMReg
> #define XMM_ONLY(...) __VA_ARGS__
> @@ -36,6 +37,7 @@
> #define L(n) ZMM_L(n)
> #define Q(n) ZMM_Q(n)
> #define SUFFIX _xmm
> +#define REG_WIDTH 16
> #endif
>
> void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
> @@ -516,7 +518,7 @@ void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int
> order)
> r.W(1) = s->W((order >> 2) & 3);
> r.W(2) = s->W((order >> 4) & 3);
> r.W(3) = s->W((order >> 6) & 3);
> - *d = r;
> + memcpy(d, &r, REG_WIDTH);
> }
Looking a bit more closely, this won't work on big-endian
hosts, because there we want to copy across the last 16
bytes of the struct, not the first 16. So I think we need
some more macro magic:
/*
* Copy the relevant parts of a Reg value around. For the
* SHIFT == 1 case these helpers operate only on the lower
* 16 bytes of a 64 byte ZMMReg, so we must copy only those
* so the guest-visible destination register has the top
* bytes left untouched. For the SHIFT == 0 case we are
* working with an MMXReg struct which is the correct size.
* Note that we can't memcpy() here because that will do
* the wrong thing on big-endian hosts.
*/
#if SHIFT == 0
#define COPY_REG(DEST, SRC) (DEST) = (SRC)
#else
#define COPY_REG(DEST, SRC) do { \
(DEST).Q(0) = (SRC).Q(0); \
(DEST).Q(1) = (SRC).Q(1); \
} while (0)
#endif
and then use COPY_REG(*d, r);
(adjust syntax to taste, not compile tested).
We could probably try to write endian-specific flavours of
memcpy() invocation, but "do two 64-bit word copies" is what
the compiler would hopefully turn the memcpy into anyway :-)
thanks
-- PMM