>   I am looking into QEMU's implementation for ARM NEON instructions
> (target-arm/neon_helper.c). Some helper functions will do mask
> operation, neon_add_u8, for example. I thought simply adding a and b
> is enough and can't figure out why the mask operation is needed.

These are SIMD instructions acting upon independent data 'lanes' packed into 
bigger data item.
Lane operations must not interfere with each other.
 
> ---
> uint32_t HELPER(neon_add_u8)(uint32_t a, uint32_t b)
> {
>     uint32_t mask;
>1:     mask = (a ^ b) & 0x80808080u;
>2:     a &= ~0x80808080u;
>3:     b &= ~0x80808080u;
>4:     return (a + b) ^ mask;
> }
> ---

In your example there are four 8-bit lanes packed into 32-bit word.
If we add whole 32-bit words then care must be taken to prevent overflow 
propagation between the lanes.
This is done by putting zero at the top bit of each 8-bit operand (steps 2 and 
3).
These top bits are summed modulo 2 separately (step 1) and then added back 
(step4).

Thanks.
-- Max

Reply via email to