------- Comment #3 from timday at bottlenose dot demon dot co dot uk 2006-11-08 10:01 ------- I've just tried an alternative version (will upload later) replacing the union with a single __v4sf _rep, and implementing the [] operators using e.g (reinterpret_cast<const float*>(&_rep))[i]; However the code generated by the two transform implementations remains the same (20 and 32 instructions anyway; haven't checked the details yet). Maybe not surprising as it's just moving the problem around.
The big difference between the two methods is perhaps primarily that the bad one involves a __v4sf->float->__vfs4 conversion, while the good one uses __v4sf throughout by using the mul_compN methods. I'll try and prepare a more concise test case based on the premise that bad handling of __v4sf <-> float is the real issue. -- timday at bottlenose dot demon dot co dot uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |timday at bottlenose dot | |demon dot co dot uk http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756