Hendrik Greving <hendrik.greving.in...@gmail.com> wrote: >The code for a simple loop like > >for (i = 0; i < LENGTH-1; i++) { > g_c[i] = g_a[i] + g_b[i]; >} > >looks good for g++ (4.9.0 20131028 (experimental)) (-O3 core-avx2) > >.L2: >vmovdqa g_a(%rax), %ymm0 # 26 *movv8si_internal/2 [length = 8] >vpaddd g_b(%rax), %ymm0, %ymm0 # 27 *addv8si3/2 [length = 8] >addq $32, %rax # 29 *adddi_1/1 [length = 4] >vmovaps %ymm0, g_c-32(%rax) # 28 *movv8si_internal/3 [length = 8] >cmpq $39968, %rax # 31 *cmpdi_1/1 [length = 6] >jne .L2 # 32 *jcc_1 [length = 2] > >but for gcc, I'm getting > >.L4: >vmovdqu (%rsi,%rax), %xmm0 # 156 sse2_loaddquv16qi [length = 5] >vinserti128 $0x1, 16(%rsi,%rax), %ymm0, %ymm0 # 157 >avx_vec_concatv32qi/1 [length = 8] >addl $1, %edx # 161 *addsi_1/1 [length = 3] >vpaddd (%rdi,%rax), %ymm0, %ymm0 # 158 *addv8si3/2 [length = 5] >vmovups %xmm0, (%rcx,%rax) # 412 *movv16qi_internal/3 [length = 5] >vextracti128 $0x1, %ymm0, 16(%rcx,%rax) # 160 vec_extract_hi_v32qi/2 >[length = 8] >addq $32, %rax # 162 *adddi_1/1 [length = 4] >cmpl $1248, %edx # 164 *cmpsi_1/1 [length = 6] >jbe .L4 # 165 *jcc_1 [length = 2] > >unless I add "__attribute__ ((aligned (64)));" g_a, g_b, g_c. > >2 questions: Does C have different alignment requirements/specs than >C++ (I don't think so)?
Try -fno-common Richard. But if so, why does gcc not just align the >arrays (they are in the same module in my example...)? Let aside the >alignment question, why not just do avx2 (ymm) moves as g++ does? > >Guess my question is, is this a bug or a feature? > >Thanks, >Regards, >Hendrik