On 6/27/19 12:56 PM, Stefan Brankovic wrote: > +void HELPER(gvec_vmrgh8)(void *d, void *a, void *b, uint32_t desc) > +{ > + intptr_t oprsz = simd_oprsz(desc); > + intptr_t i; > + > + for (i = 0; i < (oprsz / 2); i += sizeof(uint8_t)) { > + uint8_t aa = *(uint8_t *)(a + 8 * sizeof(uint8_t) + i); > + uint8_t bb = *(uint8_t *)(b + 8 * sizeof(uint8_t) + i); > + *(uint8_t *)(d + 2 * i) = bb; > + *(uint8_t *)(d + 2 * i + sizeof(uint8_t)) = aa; > + } > + clear_high(d, oprsz, desc); > +}
I tried this while developing the ARM SVE code. The problem is that the vector element numbering differs for each host. So while you may be able to get the correct results out of x86, you'll the the wrong answers when you run this same code on a big-endian host. The same goes for the INDEX_OP_vmrgh_vec opcode you introduced. r~