On Fri, Jan 03, 2014 at 05:04:55PM +0100, Toon Moene wrote:
> I am trying to figure out how the top-consuming routines in our
> weather models will be compiled when using AVX512 instructions (and
> their 32 512 bit registers).
> 
> I thought an up-to-date trunk version of gcc, using the command line:
> 
> <...>/gfortran -Ofast -S -mavx2 -mavx512f <source code>
> 
> would do that.
> 
> Unfortunately, I do not see any use of the new zmm.. registers,
> which might mean that AVX512 isn't used yet.
> 
> This is how the nightly build job builds the trunk gfortran compiler:
> 
> configure --prefix=/home/toon/compilers/install --with-gnu-as
> --with-gnu-ld --enable-languages=fortran<,other-language>
> --disable-multilib --disable-nls --with-arch=core-avx2
> --with-tune=core-avx2
> 
> Is it the --with-arch=core-avx2 ? Or perhaps the --with-gnu-as
> --with-gnu-ld (because the installed ones do not support AVX512 yet
> ?).

You shouldn't need assembler with AVX512 support just for -S,
if I try say simple:
void f1 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i < 1024; 
i++) e[i] = f[i] * 7; }
void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i < 1024; 
i++) e[i] = f[i]; }
-O2 -ftree-vectorize -mavx512f I get:
        vmovdqa64       .LC0(%rip), %zmm1
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
.L2:
        vpmulld (%rsi,%rax), %zmm1, %zmm0
        vmovdqu32       %zmm0, (%rdi,%rax)
        addq    $64, %rax
        cmpq    $4096, %rax
        jne     .L2
        rep; ret
and
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
.L6:
        vmovdqu64       (%rsi,%rax), %zmm0
        vmovdqu32       %zmm0, (%rdi,%rax)
        addq    $64, %rax
        cmpq    $4096, %rax
        jne     .L6
        rep; ret

You can look at -fdump-tree-vect-details if something hasn't been vectorized
why it hasn't been vectorized.

        Jakub

Reply via email to