Issue 133568
Summary [AVX512] Avoid Memory form of Compress in AMD znver4
Labels new issue
Assignees
Reporter venkataramananhashkumar
    for the given LLVM IR code X86 generates memory form of compress. 
ref: https://godbolt.org/z/KhhczdbY8

.LBB0_4: # %vector.body
        vptestmd        k1, ymm1, ymm0
        movsxd  r8, r8d
        vpaddd  ymm1, ymm1, ymm2
 vmovupd zmm3 {k1} {z}, zmmword ptr [rsi + r11]
        kmovb   ebx, k1
 popcnt  ebx, ebx
        kortestb        k1, k1
        cmove   ebx, r10d
        add     r11, 64
        vcompresspd     zmmword ptr [rdi + 8*r8] {k1}, zmm3
        add     r8d, ebx
        cmp     r9, r11
 jne     .LBB0_4

Memory form is micro coded and slower.  We need to generate sequence as show below. 

        kmovb   %k1, %r11d
 pextl   %r11d, %r11d, %ebx
        vcompresspd     %zmm3, %zmm3 {%k1} {z}
 kmovd   %ebx, %k1
        vmovupd %zmm3, (%rdi,%rcx,8) {%k1}

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to