Clang is using 64-bit absolute addresses when accessing static data in
64-bit mode. This is inefficient because it requires an extra 10-bytes
long instruction for loading an address into a register every time it
needs to access static data. All other compilers use relative addresses.
Example:
#include <immintrin.h>
__m128d test (__m128d a) {
__m128d b = _mm_add_pd(a, _mm_set1_pd(1.5));
__m128d c = _mm_mul_pd(b, _mm_set1_pd(2.5));
return c;
}
Assembly output:
.LCPI0_0:
.quad 4609434218613702656 # double 1.5
.quad 4609434218613702656 # double 1.5
.LCPI0_1:
.quad 4612811918334230528 # double 2.5
.quad 4612811918334230528 # double 2.5
.text
.globl _Z4testDv2_d
.p2align 4, 0x90
_Z4testDv2_d: # @_Z4testDv2_d
# BB#0:
vmovapd (%rcx), %xmm0
movabsq $.LCPI0_0, %rax
vaddpd (%rax), %xmm0, %xmm0
movabsq $.LCPI0_1, %rax
vmulpd (%rax), %xmm0, %xmm0
retq
Linux Clang uses 32-bit relative addresses:
vaddpd .LCPI0_0(%rip), %xmm0, %xmm0
vmulpd .LCPI0_1(%rip), %xmm0, %xmm0
retq
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple