Ok, thanks, that explains it... Apparently x86 splits the vector movs into 2 in ix86_expand_vector_move_misalign->ix86_avx256_split_vector_move_misalign. But I wanted to mention that e.g. icc, despite also putting g_a, g_b, g_c into .comm, actually generates AVX2 vmovdqu using ymm...
Examples: foo.c: #include <stdio.h> #include <stdint.h> #include "foo.h" int g_a[LENGTH]; int g_b[LENGTH]; int g_c[LENGTH]; void foo() { int i ; for (i = 0; i < LENGTH; i++) { g_c[i] = g_a[i] + g_b[i]; } } icc: icc/13.1.3/bin/icc -S -O3 -march=core-avx2 foo.c -v -save-temps -vec-report=2 gcc: gcc -S -O3 -march=core-avx2 foo.c -ftree-vectorizer-verbose=1 -dp -v -da On Mon, Nov 11, 2013 at 1:31 PM, David Edelsohn <dje....@gmail.com> wrote: > On Mon, Nov 11, 2013 at 3:56 PM, Richard Henderson <r...@redhat.com> wrote: > >>> I suppose targets without .bss section support should not switch >>> (that is, targets not defining BSS_SECTION_ASM_OP or >>> ASM_OUTPUT_ALIGNED_BSS). >> >> Good point. I don't expect that we have many of those left, but >> if any do still exist... > > AIX XCOFF, although it probably can be changed to explicitly use a BSS > section. > > - David