https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #26 from Chris Elrod <elrodc at gmail dot com> ---
> You can try enabling -mrecip to see RSQRT in .optimized - there's
> probably late 1/sqrt optimization on RTL.

No luck. The full commands I used:

gfortran -Ofast -mrecip -S -fdump-tree-optimized -march=native -shared -fPIC
-mprefer-vector-width=512 -fno-semantic-interposition -o
gfortvectorizationdump.s  vectorization_test.f90

g++ -mrecip -Ofast -fdump-tree-optimized -S -march=native -shared -fPIC
-mprefer-vector-width=512 -fno-semantic-interposition -o
gppvectorization_test.s  vectorization_test.cpp

g++'s output was similar:

  vect_U33_60.31_372 = SQRT (vect_S33_59.30_371);
  vect_Ui33_61.32_374 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0,
1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0
} / vect_U33_60.31_372;
  vect_U13_62.33_375 = vect_S13_47.24_359 * vect_Ui33_61.32_374;
  vect_U23_63.34_376 = vect_S23_53.27_365 * vect_Ui33_61.32_374;

and it has the same assembly as gfortran for the rsqrt:

        vcmpps  $4, %zmm0, %zmm5, %k1
        vrsqrt14ps      %zmm0, %zmm1{%k1}{z}
        vmulps  %zmm0, %zmm1, %zmm2
        vmulps  %zmm1, %zmm2, %zmm0
        vmulps  %zmm6, %zmm2, %zmm2
        vaddps  %zmm7, %zmm0, %zmm0
        vmulps  %zmm2, %zmm0, %zmm0
        vrcp14ps        %zmm0, %zmm10
        vmulps  %zmm0, %zmm10, %zmm0
        vmulps  %zmm0, %zmm10, %zmm0
        vaddps  %zmm10, %zmm10, %zmm10
        vsubps  %zmm0, %zmm10, %zmm10

Reply via email to