http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016
Bug #: 55016 Summary: request for specific builtins for rcp and rsqrt Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: vincenzo.innoce...@cern.ch There are cases where the use of approximate rcp and rsqrt suffice. I wonder if it would be possible to introduce specific "generic" builtins for "rcp" and "rsqrt" that produce the proper instruction depending on the target architecture (see,avx etc) and eventually generate vector instruction in a loop at the moment anything like this is target specific, inefficient and does not vectorize! #include <x86intrin.h> float v0[1024]; float v1[1024]; inline float rsqrtf( float x ) { return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) ); } void v() { for(int i=0; i!=1024; ++i) v0[i] = rsqrtf(v1[i]); }