> On 31 Mar 2018, at 01:50, Jakub Jelinek <ja...@redhat.com> wrote: > Hi! > > The code we emit on the following testcases is really terrible, with both > -mavx512f -mno-avx512bw as well as -mavx512bw, rather than doing e.g. > vpinsrw $0, %edi, %xmm0, %xmm1 > vinserti32x4 $0, %xmm1, %zmm0, %zmm0 > when trying to insert into low 128-bits or > vextracti32x4 $1, %zmm0, %xmm1 > vpinsrw $3, %edi, %xmm1, %xmm1 > vinserti32x4 $1, %xmm1, %zmm0, %zmm0 > when trying to insert into other 128-bits, we emit: > pushq %rbp > vmovq %xmm0, %rax > movzwl %di, %edi > xorw %ax, %ax > movq %rsp, %rbp > orq %rdi, %rax > andq $-64, %rsp > vmovdqa64 %zmm0, -64(%rsp) > movq %rax, -64(%rsp) > vmovdqa64 -64(%rsp), %zmm0 > leave > and furthermore there is some RA bug it triggers, so we miscompile it. > All this is because while at least for AVX512BW we have ix86_expand_vector_set > implemented for V64QImode and V32QImode, we actually don't have a pattern > for it. Fixed by adding those modes to V, which is used only by this > vec_set pattern and some xop pattern, which is misusing it anyway (it really > can't handle 512-bit vectors, so could result in assembly failures etc. > with -mxop -mavx512f). Furthermore, for AVX512F we can use the above > extraction/insertion/insertion sequence, similarly to what we do for 256-bit > insertions, just we have halves there rather than quarters. > The splitters are added to match what vec_set_lo_* define_insn_and_split do, > if we are trying to extract low 128-bit lane of a 512-bit vector, we really > don't need vextracti32x4 instruction, we can use simple move or even nothing > at all (if source and destination are the same register, or post RA register > renaming can arrange that). > > The RA bug still should be fixed, but at least this patch makes it latent > and improves a lot the code. Bootstrapped/regtested on x86_64-linux and > i686-linux, ok for trunk? OK.
— Thanks, K