Issue |
133568
|
Summary |
[AVX512] Avoid Memory form of Compress in AMD znver4
|
Labels |
new issue
|
Assignees |
|
Reporter |
venkataramananhashkumar
|
for the given LLVM IR code X86 generates memory form of compress.
ref: https://godbolt.org/z/KhhczdbY8
.LBB0_4: # %vector.body
vptestmd k1, ymm1, ymm0
movsxd r8, r8d
vpaddd ymm1, ymm1, ymm2
vmovupd zmm3 {k1} {z}, zmmword ptr [rsi + r11]
kmovb ebx, k1
popcnt ebx, ebx
kortestb k1, k1
cmove ebx, r10d
add r11, 64
vcompresspd zmmword ptr [rdi + 8*r8] {k1}, zmm3
add r8d, ebx
cmp r9, r11
jne .LBB0_4
Memory form is micro coded and slower. We need to generate sequence as show below.
kmovb %k1, %r11d
pextl %r11d, %r11d, %ebx
vcompresspd %zmm3, %zmm3 {%k1} {z}
kmovd %ebx, %k1
vmovupd %zmm3, (%rdi,%rcx,8) {%k1}
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs