Re: [PATCH] Improve avx512{f,bw} vec_set (PR middle-end/85090)

Kirill Yukhin Sat, 31 Mar 2018 20:14:09 -0700


> On 31 Mar 2018, at 01:50, Jakub Jelinek <ja...@redhat.com> wrote:
> Hi!
> 
> The code we emit on the following testcases is really terrible, with both
> -mavx512f -mno-avx512bw as well as -mavx512bw, rather than doing e.g.
>        vpinsrw $0, %edi, %xmm0, %xmm1
>        vinserti32x4    $0, %xmm1, %zmm0, %zmm0
> when trying to insert into low 128-bits or
>        vextracti32x4   $1, %zmm0, %xmm1
>        vpinsrw $3, %edi, %xmm1, %xmm1
>        vinserti32x4    $1, %xmm1, %zmm0, %zmm0
> when trying to insert into other 128-bits, we emit:
>        pushq   %rbp
>        vmovq   %xmm0, %rax
>        movzwl  %di, %edi
>        xorw    %ax, %ax
>        movq    %rsp, %rbp
>        orq     %rdi, %rax
>        andq    $-64, %rsp
>        vmovdqa64       %zmm0, -64(%rsp)
>        movq    %rax, -64(%rsp)
>        vmovdqa64       -64(%rsp), %zmm0
>        leave
> and furthermore there is some RA bug it triggers, so we miscompile it.
> All this is because while at least for AVX512BW we have ix86_expand_vector_set
> implemented for V64QImode and V32QImode, we actually don't have a pattern
> for it.  Fixed by adding those modes to V, which is used only by this
> vec_set pattern and some xop pattern, which is misusing it anyway (it really
> can't handle 512-bit vectors, so could result in assembly failures etc.
> with -mxop -mavx512f).  Furthermore, for AVX512F we can use the above
> extraction/insertion/insertion sequence, similarly to what we do for 256-bit
> insertions, just we have halves there rather than quarters.
> The splitters are added to match what vec_set_lo_* define_insn_and_split do,
> if we are trying to extract low 128-bit lane of a 512-bit vector, we really
> don't need vextracti32x4 instruction, we can use simple move or even nothing
> at all (if source and destination are the same register, or post RA register
> renaming can arrange that).
> 
> The RA bug still should be fixed, but at least this patch makes it latent
> and improves a lot the code.  Bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk? 
OK.


—
Thanks, K

Re: [PATCH] Improve avx512{f,bw} vec_set (PR middle-end/85090)

Reply via email to