On 9/12/22 00:03, Paolo Bonzini wrote:
@@ -102,6 +107,25 @@ static void gen_load_sse(DisasContext *s, TCGv temp, MemOp 
ot, int dest_ofs)
+static inline bool sse_needs_alignment(DisasContext *s, X86DecodedInsn 
*decode, X86DecodedOp *op)
+{

Drop inline. You may require adding G_GNUC_UNUSED temporarily, because it isn't used in this patch...

@@ -175,7 +199,13 @@ static void gen_writeback(DisasContext *s, X86DecodedOp 
*op)
          }
          break;
      case X86_OP_MMX:
+        break;
      case X86_OP_SSE:
+        if ((s->prefix & PREFIX_VEX) && op->ot == MO_128) {
+            tcg_gen_gvec_dup_imm(MO_64,
+                                 offsetof(CPUX86State, 
xmm_regs[op->n].ZMM_X(1)),
+                                 16, 16, 0);
+        }

So... gvec supports doing this zeroing within the operation.  E.g.

static void gen_PADDB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
    tcg_gen_gvec_add(MO_8, decode->op[0].offset,
                     decode->op[1].offset, decode->op[2].offset,
                     sse_vec_len(s, decode), sse_vec_len_max(s, decode));
}

The only catch is that gvec expects the zeroing to be at the end of the range, so this requires reorganizing ZMM for big-endian. Instead of reversing the entire ZMM register, we would keep only each 16-byte lane in host-endian order. Like so:

  #if HOST_BIG_ENDIAN

- #define ZMM_B(n) _b_ZMMReg[63 - (n)]

+ #define ZMM_B(n) _b_ZMMReg[(n) ^ 15]

etc.

Ideally this zeroing above would move into each operation. For our current set of helpers, it should be easy enough to do in gen_binary_int_sse and friends.


r~

Reply via email to