Re: Vectorizer/alignment

Richard Biener Fri, 08 Nov 2013 09:53:49 -0800

Hendrik Greving <hendrik.greving.in...@gmail.com> wrote:
>The code for a simple loop like
>
>for (i = 0; i < LENGTH-1; i++) {
>        g_c[i] = g_a[i] + g_b[i];
>}
>
>looks good for g++ (4.9.0 20131028 (experimental)) (-O3 core-avx2)
>
>.L2:
>vmovdqa g_a(%rax), %ymm0 # 26 *movv8si_internal/2 [length = 8]
>vpaddd g_b(%rax), %ymm0, %ymm0 # 27 *addv8si3/2 [length = 8]
>addq $32, %rax # 29 *adddi_1/1 [length = 4]
>vmovaps %ymm0, g_c-32(%rax) # 28 *movv8si_internal/3 [length = 8]
>cmpq $39968, %rax # 31 *cmpdi_1/1 [length = 6]
>jne .L2 # 32 *jcc_1 [length = 2]
>
>but for gcc, I'm getting
>
>.L4:
>vmovdqu (%rsi,%rax), %xmm0 # 156 sse2_loaddquv16qi [length = 5]
>vinserti128 $0x1, 16(%rsi,%rax), %ymm0, %ymm0 # 157
>avx_vec_concatv32qi/1 [length = 8]
>addl $1, %edx # 161 *addsi_1/1 [length = 3]
>vpaddd (%rdi,%rax), %ymm0, %ymm0 # 158 *addv8si3/2 [length = 5]
>vmovups %xmm0, (%rcx,%rax) # 412 *movv16qi_internal/3 [length = 5]
>vextracti128 $0x1, %ymm0, 16(%rcx,%rax) # 160 vec_extract_hi_v32qi/2
>[length = 8]
>addq $32, %rax # 162 *adddi_1/1 [length = 4]
>cmpl $1248, %edx # 164 *cmpsi_1/1 [length = 6]
>jbe .L4 # 165 *jcc_1 [length = 2]
>
>unless I add "__attribute__ ((aligned (64)));" g_a, g_b, g_c.
>
>2 questions: Does C have different alignment requirements/specs than
>C++ (I don't think so)?


Try -fno-common

Richard.

 But if so, why does gcc not just align the
>arrays (they are in the same module in my example...)? Let aside the
>alignment question, why not just do avx2 (ymm) moves as g++ does?
>
>Guess my question is, is this a bug or a feature?
>
>Thanks,
>Regards,
>Hendrik

Re: Vectorizer/alignment

Reply via email to