https://llvm.org/bugs/show_bug.cgi?id=27107
Bug ID: 27107 Summary: LLVM misses reciprocal estimate instructions in ISel on ARMv7 Product: new-bugs Version: trunk Hardware: PC OS: All Status: NEW Severity: normal Priority: P Component: new bugs Assignee: unassignedb...@nondot.org Reporter: ste...@uplinklabs.net CC: llvm-bugs@lists.llvm.org Classification: Unclassified LLVM is missing the opportunity to use VRSQRTE/VRSQRTS (and the other reciprocal estimate instructions): $ cat rsqrt.c #include <math.h> float rsqrtf(float f) { return 1.0f / sqrtf(f); } $ clang -O3 -mcpu=native -mfpu=neon -mfloat-abi=hard -ffast-math -S -o - rsqrt.c | showasm rsqrtf: @ @rsqrtf vsqrt.f32 s0, s0 vmov.f32 s2, #1.000000e+00 vdiv.f32 s0, s2, s0 bx lr Conversely, on x86_64, LLVM does the right thing: $ clang -O3 -march=core-avx2 -ffast-math -S -o - rsqrt.c | showasm rsqrtf: # @rsqrtf vrsqrtss %xmm0, %xmm0, %xmm1 vmulss %xmm1, %xmm1, %xmm2 vfmadd213ss .LCPI1_0(%rip), %xmm0, %xmm2 vmulss .LCPI1_1(%rip), %xmm1, %xmm0 vmulss %xmm0, %xmm2, %xmm0 retq It will even apply this properly to packed vectors if the inputs make sense for it. Right now the lack of reciprocal square root estimates on ARM breaks auto-vectorization for a silly program I wrote, and the hand-written NEON intrinsics version is beating the auto-vectorized variants (because the auto-vectorization fails and everything gets bottlenecked around vsqrt+vdiv). I looked at implementing this myself but got confused trying to understand tablegen syntax. It looks like there just needs to be an ARMTargetLowering implementation for TargetLowering::getRsqrtEstimate and TargetLowering::getRecipEstimate. -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs