------- Comment #3 from timday at bottlenose dot demon dot co dot uk  
2006-11-08 10:01 -------
I've just tried an alternative version (will upload later) replacing the union
with a single
  __v4sf _rep,
and implementing the [] operators using e.g
  (reinterpret_cast<const float*>(&_rep))[i];
However the code generated by the two transform implementations remains the
same (20 and 32 instructions anyway; haven't checked the details yet).
Maybe not surprising as it's just moving the problem around.

The big difference between the two methods is perhaps primarily that the bad
one involves a __v4sf->float->__vfs4 conversion, while the good one uses __v4sf
throughout by using the mul_compN methods.  I'll try and prepare a more concise
test case based on the premise that bad handling of __v4sf <-> float is the
real issue.


timday at bottlenose dot demon dot co dot uk changed:

           What    |Removed                     |Added
                 CC|                            |timday at bottlenose dot
                   |                            |demon dot co dot uk


Reply via email to