On Mon, Jan 13, 2014 at 7:26 PM, Kirill Yukhin <kirill.yuk...@gmail.com> wrote:
>> > Kirill, is it possible for you to test the patch in the simulator? Do >> > we have a testcase in gcc's testsuite that can be used to check this >> > patch? >> >> E.g. gcc.target/i386/avx2-gather* and avx512f-gather*. > This tests are for built-in generation. The issue is connected to > auto code gen. > > It seems to be working, we have for hss2a.fppized.f: > .L402: > vmovdqu64 (%rdi,%rax), %zmm1 > kmovw %k1, %k3 > kmovw %k1, %k2 > kmovw %k1, %k4 > kmovw %k1, %k5 > addl $1, %esi > vpgatherdd npwrx.4971-4(,%zmm1,4), %zmm0{%k3} > vpgatherdd (%r10,%zmm1,4), %zmm2{%k2} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm7, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r11,%rax) > vpgatherdd npwry.4973-4(,%zmm1,4), %zmm0{%k4} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm6, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r9,%rax) > vpgatherdd npwrz.4975-4(,%zmm1,4), %zmm0{%k5} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm5, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r14,%rax) > vpaddd %zmm2, %zmm4, %zmm0 > vmovdqa64 %zmm0, (%r15,%rax) > addq $64, %rax > cmpl %esi, %edx > ja .L402 An unrelated observation: gcc should figure out that %k1 mask register can be used in all gather insns and avoid unnecessary copies at the beginning of the loop. Uros.