http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016



             Bug #: 55016

           Summary: request for specific builtins for rcp and rsqrt

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: enhancement

          Priority: P3

         Component: tree-optimization

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: vincenzo.innoce...@cern.ch





There are cases where the use of approximate rcp and rsqrt suffice.



I wonder if it would be possible to introduce specific "generic" builtins for

"rcp" and "rsqrt" that produce the proper instruction depending on the target

architecture (see,avx etc) and eventually generate vector instruction in a loop



at the moment anything like this is target specific, inefficient and does not

vectorize!



#include <x86intrin.h>

float v0[1024];

float v1[1024];

inline

float rsqrtf( float x ) {

  return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) );

}

void v() {

  for(int i=0; i!=1024; ++i)

    v0[i] = rsqrtf(v1[i]);

}

Reply via email to