Two things I'm wondering about: 1. Why do _builtin_ia32_paddusb and similar functions take signed vector arguments, when the hardware primitive is defined to operate on unsigned vectors?
2. Why are there no sse equivalents of those functions, ones that operate on 128 bit values (i.e., paddusb for v16qi vectors)? paul