On 6/7/21 9:57 AM, Peter Maydell wrote:
+#define DO_VDUP(OP, ESIZE, TYPE, H)                                     \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t val)     \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (val & bytemask);                                \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+DO_VDUP(vdupb, 1, uint8_t, H1)
+DO_VDUP(vduph, 2, uint16_t, H2)
+DO_VDUP(vdupw, 4, uint32_t, H4)

Hmm. I think the masking should be done at either uint32_t or uint64_t. Doing it byte-by-byte is wasteful.

Whether you want to do the replication in tcg (I can export gen_dup_i32 from tcg-op-gvec.c) and have one helper, or do the replication here e.g.

static void do_vdup(CPUARMState *env, void *vd, uint64_t val);
void helper(mve_vdupb)(CPUARMState *env, void *vd, uint32_t val)
{
    do_vdup(env, vd, dup_const(MO_8, val));
}


r~

Reply via email to