[Bug middle-end/29756] SSE intrinsics hard to use without redundant temporaries appearing

timday at bottlenose dot demon dot co dot uk Wed, 08 Nov 2006 14:18:40 -0800


------- Comment #4 from timday at bottlenose dot demon dot co dot uk  
2006-11-08 22:18 -------
Created an attachment (id=12573)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12573&action=view)
More concise demonstration of the v4sf->float->v4sf issue.


The attached code, (no classes or unions, just a few inline functions) obtained
from
  gcc -v -save-temps -S -O3 -march=pentium3 -mfpmath=sse -msse
-fomit-frame-pointer v4sf.cpp
compiles transform_good to 18 instructions and transform_bad to 33.  However
it's not really surprising a round-trip through stack temporaries is required
when pointer arithmetic is being used to extract a float from a __v4sf.  I've
no idea whether it's realistic to hope this could ever be optimised away. 
Alternatively, it would be very nice if the builtin vector types simply
provided a [] operator, or if there were some intrinsics for extracting floats
from a __v4sf.

(In the meantime, in the original vector4f class, remaining in the __v4sf
domain by having the const operator[] return a suitably type-wrapped __v4sf
"filled" with the specified component seems to be a promising direction).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

[Bug middle-end/29756] SSE intrinsics hard to use without redundant temporaries appearing

Reply via email to