On Fri, Jan 03, 2014 at 05:04:55PM +0100, Toon Moene wrote: > I am trying to figure out how the top-consuming routines in our > weather models will be compiled when using AVX512 instructions (and > their 32 512 bit registers). > > I thought an up-to-date trunk version of gcc, using the command line: > > <...>/gfortran -Ofast -S -mavx2 -mavx512f <source code> > > would do that. > > Unfortunately, I do not see any use of the new zmm.. registers, > which might mean that AVX512 isn't used yet. > > This is how the nightly build job builds the trunk gfortran compiler: > > configure --prefix=/home/toon/compilers/install --with-gnu-as > --with-gnu-ld --enable-languages=fortran<,other-language> > --disable-multilib --disable-nls --with-arch=core-avx2 > --with-tune=core-avx2 > > Is it the --with-arch=core-avx2 ? Or perhaps the --with-gnu-as > --with-gnu-ld (because the installed ones do not support AVX512 yet > ?).
You shouldn't need assembler with AVX512 support just for -S, if I try say simple: void f1 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i < 1024; i++) e[i] = f[i] * 7; } void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i < 1024; i++) e[i] = f[i]; } -O2 -ftree-vectorize -mavx512f I get: vmovdqa64 .LC0(%rip), %zmm1 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L2: vpmulld (%rsi,%rax), %zmm1, %zmm0 vmovdqu32 %zmm0, (%rdi,%rax) addq $64, %rax cmpq $4096, %rax jne .L2 rep; ret and xorl %eax, %eax .p2align 4,,10 .p2align 3 .L6: vmovdqu64 (%rsi,%rax), %zmm0 vmovdqu32 %zmm0, (%rdi,%rax) addq $64, %rax cmpq $4096, %rax jne .L6 rep; ret You can look at -fdump-tree-vect-details if something hasn't been vectorized why it hasn't been vectorized. Jakub