Hi, In the following sample code #define N 4 float a[N] __attribute__ ((aligned (16))); float b[N] __attribute__ ((aligned (16))); float c[N] __attribute__ ((aligned (16))); double d[N] __attribute__ ((aligned (16)));
void f(float *pa, float *pb); int main(){ int i; f(a,b); for(i=0; i<N; i++){ c[i] = b[i] + a[i] + d[i]; } return 0; } In the above code, a, b, c is float and d is double. Thus, it has to convert float to double after b[i]+a[i]. I use gcc-4.6 with flag "gcc-4.6 -O3 -msse2 -msse3 -mfpmath=sse -ftree-vectorize -mmmx -msse4.1 -msse -S". As expected, It can generate lots of SSE instructions. As following: ..... movaps b(%rip), %xmm0 xorl %eax, %eax xorps %xmm1, %xmm1 addps a(%rip), %xmm0 movhlps %xmm0, %xmm1 cvtps2pd %xmm0, %xmm2 cvtps2pd %xmm1, %xmm1 addpd d(%rip), %xmm2 addpd d+16(%rip), %xmm1 cvtpd2ps %xmm2, %xmm0 cvtpd2ps %xmm1, %xmm1 movlhps %xmm1, %xmm0 movaps %xmm0, c(%rip) addq $8, %rsp ..... However, I use arm-linux-gnueabihf-gcc-4.6 with flag "-static -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -mvectorize-with-neon-quad -ftree-slp-vectorize -march=armv7-a -mtune=cortex-a15 -O3 -Ofast -S". It generate all scalar instructions without any NEON instructions. Although NEON doesn't support double precision floating point, it still can generate "b add_neon a" first. Then, using scalar instructions to do other computation. Are there any reasons such that arm-linux-gnueabihf-gcc-4.6 doesn't generate binary contain NEON? Any help appreciated. Thank you very much, Sheng Yu