On 6/7/21 9:57 AM, Peter Maydell wrote:
+#define DO_VDUP(OP, ESIZE, TYPE, H) \ + void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t val) \ + { \ + TYPE *d = vd; \ + uint16_t mask = mve_element_mask(env); \ + unsigned e; \ + for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \ + uint64_t bytemask = mask_to_bytemask##ESIZE(mask); \ + d[H(e)] &= ~bytemask; \ + d[H(e)] |= (val & bytemask); \ + } \ + mve_advance_vpt(env); \ + } + +DO_VDUP(vdupb, 1, uint8_t, H1) +DO_VDUP(vduph, 2, uint16_t, H2) +DO_VDUP(vdupw, 4, uint32_t, H4)
Hmm. I think the masking should be done at either uint32_t or uint64_t. Doing it byte-by-byte is wasteful.
Whether you want to do the replication in tcg (I can export gen_dup_i32 from tcg-op-gvec.c) and have one helper, or do the replication here e.g.
static void do_vdup(CPUARMState *env, void *vd, uint64_t val); void helper(mve_vdupb)(CPUARMState *env, void *vd, uint32_t val) { do_vdup(env, vd, dup_const(MO_8, val)); } r~