Ross Ridge writes: >tbp is correct. Using casts gets you the integer bitwise instrucitons, >not the single-precision bitwise instructions that are more optimal for >flipping bits in single-precision vectors. If you want GCC to generate >better code using single-precision bitwise instructions you're now forced >to use the intrinsics.
GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions. If I were tbp, I'd just code all his vector operatations using intrinsics. The other responses in this thread have made it clear that GCC's vector arithemetic operations are really only designed to be used with the Cell Broadband Engine and other Power PC processors. Ross Ridge