On 9/12/22 00:03, Paolo Bonzini wrote:
@@ -102,6 +107,25 @@ static void gen_load_sse(DisasContext *s, TCGv temp, MemOp
ot, int dest_ofs)
+static inline bool sse_needs_alignment(DisasContext *s, X86DecodedInsn
*decode, X86DecodedOp *op)
+{
Drop inline. You may require adding G_GNUC_UNUSED temporarily, because it isn't used in
this patch...
@@ -175,7 +199,13 @@ static void gen_writeback(DisasContext *s, X86DecodedOp
*op)
}
break;
case X86_OP_MMX:
+ break;
case X86_OP_SSE:
+ if ((s->prefix & PREFIX_VEX) && op->ot == MO_128) {
+ tcg_gen_gvec_dup_imm(MO_64,
+ offsetof(CPUX86State,
xmm_regs[op->n].ZMM_X(1)),
+ 16, 16, 0);
+ }
So... gvec supports doing this zeroing within the operation. E.g.
static void gen_PADDB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
tcg_gen_gvec_add(MO_8, decode->op[0].offset,
decode->op[1].offset, decode->op[2].offset,
sse_vec_len(s, decode), sse_vec_len_max(s, decode));
}
The only catch is that gvec expects the zeroing to be at the end of the range, so this
requires reorganizing ZMM for big-endian. Instead of reversing the entire ZMM register, we
would keep only each 16-byte lane in host-endian order. Like so:
#if HOST_BIG_ENDIAN
- #define ZMM_B(n) _b_ZMMReg[63 - (n)]
+ #define ZMM_B(n) _b_ZMMReg[(n) ^ 15]
etc.
Ideally this zeroing above would move into each operation. For our current set of
helpers, it should be easy enough to do in gen_binary_int_sse and friends.
r~