http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363
--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2011-06-11 05:58:46 UTC --- These new developments sound interesting. Hope somebody is working on it and will publish a testable version soon. On the other hand I was thinking more of exploiting auto-vectorization for which having multiple copy of the very same code looks to me not necessary and error prone, Something for instance along these line float __attribute__ ((__target__ ("sse2","sse3","avx","fma"))) sum0(float const * __restrict__ x, float const * __restrict__ y, float const * __restrict__ z) { float sum=0; for (int i=0; i!=1024; ++i) sum += z[i]+x[i]*y[i]; return sum; } If my understanding of the proposal is correct I will have to copy-paste this function four times, one for each target.