On 6/27/19 12:56 PM, Stefan Brankovic wrote:
> +void HELPER(gvec_vmrgh8)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < (oprsz / 2); i += sizeof(uint8_t)) {
> +        uint8_t aa = *(uint8_t *)(a + 8 * sizeof(uint8_t) + i);
> +        uint8_t bb = *(uint8_t *)(b + 8 * sizeof(uint8_t) + i);
> +        *(uint8_t *)(d + 2 * i) = bb;
> +        *(uint8_t *)(d + 2 * i + sizeof(uint8_t)) = aa;
> +    }
> +    clear_high(d, oprsz, desc);
> +}

I tried this while developing the ARM SVE code.

The problem is that the vector element numbering differs for each host.  So
while you may be able to get the correct results out of x86, you'll the the
wrong answers when you run this same code on a big-endian host.

The same goes for the INDEX_OP_vmrgh_vec opcode you introduced.


r~

Reply via email to