https://bugs.llvm.org/show_bug.cgi?id=45808
Bug ID: 45808
Summary: Suboptimal code for
_mm256_zextsi128_si256(_mm_set1_epi8(-1))
Product: new-bugs
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedb...@nondot.org
Reporter: n...@self-evident.org
CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org
Related: Bug #45806 and https://stackoverflow.com/q/61601902/
I am trying to produce an AVX2 mask with all-ones in the lower lane and
all-zeroes in the upper lane of a YMM register. The code I am using is:
__m256i mask = _mm256_zextsi128_si256(_mm_set1_epi8(-1));
This should produce a single instruction like `vpcmpeqd %xmm0,%xmm0,%xmm0`, but
Clang insists on putting the value into memory and loading it.
However, Clang insists on putting this into memory and loading it.
The behavior in context is even more odd:
__m256i minmax(__m256i v1, __m256i v2)
{
__m256i comp = _mm256_cmpgt_epi64(v1, v2);
__m256i mask = _mm256_zextsi128_si256(_mm_set1_epi8(-1));
return _mm256_blendv_epi8(v2, v1, _mm256_xor_si256(comp, mask));
}
This goes through a bunch of contortions with extracting, shifting, and
expanding 128-bit registers when I feel like the result I want is pretty
straightforward.
Godbolt example: https://gcc.godbolt.org/z/GPhJ6s
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs