------- Comment #4 from timday at bottlenose dot demon dot co dot uk 2006-11-08 22:18 ------- Created an attachment (id=12573) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12573&action=view) More concise demonstration of the v4sf->float->v4sf issue.
The attached code, (no classes or unions, just a few inline functions) obtained from gcc -v -save-temps -S -O3 -march=pentium3 -mfpmath=sse -msse -fomit-frame-pointer v4sf.cpp compiles transform_good to 18 instructions and transform_bad to 33. However it's not really surprising a round-trip through stack temporaries is required when pointer arithmetic is being used to extract a float from a __v4sf. I've no idea whether it's realistic to hope this could ever be optimised away. Alternatively, it would be very nice if the builtin vector types simply provided a [] operator, or if there were some intrinsics for extracting floats from a __v4sf. (In the meantime, in the original vector4f class, remaining in the __v4sf domain by having the const operator[] return a suitably type-wrapped __v4sf "filled" with the specified component seems to be a promising direction). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756