http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57952

            Bug ID: 57952
           Summary: AVX/AVX2 no ymm registries used in a trivial reduction
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch

in this quite trivial benchmark
gcc does not generate avx/avx2 instruction using ymm registries
c++ -Ofast -S polyAVX.cpp -march=core-avx2 ; grep -c "ymm" polyAVX.s
0
clang++ -Ofast -S polyAVX.cpp -march=core-avx2 ; grep -c "ymm" polyAVX.s
73

same for -march=corei7-avx
gcc version 4.9.0 20130718 (experimental) [trunk revision 201034] (GCC) 


with obvious speed effect…
 c++ -Ofast polyAVX.cpp -march=core-avx2 ; time ./a.out 
0.508u 0.000s 0:00.50 100.0%    0+0k 0+0io 1pf+0w
clang++ -Ofast polyAVX.cpp -march=core-avx2 ; time ./a.out
0.257u 0.000s 0:00.25 100.0%    0+0k 0+0io 1pf+0w


cat polyAVX.cpp
//template<typename T>
typedef float T;
inline T polyHorner(T y) {
  return  T(0x2.p0) + y * (T(0x2.p0) + y * (T(0x1.p0) + y * (T(0x5.55523p-4) +
y * (T(0x1.5554dcp-4) + y * (T(0x4.48f41p-8) + y * T(0xb.6ad4p-12)))))) ;
}

int main() {

    bool ret=true;
    float s =0;
    for (int k=0; k!=100; ++k) {
      float c = 1.f/1000000.f;
      for (int i=1; i<10000001; ++i) s+= polyHorner((float(i)+1.f)*c);
    }
    ret &= s!=0;


  return ret ? 0 : -1;


}

Reply via email to