https://bugs.llvm.org/show_bug.cgi?id=34382
Bug ID: 34382
Summary: [X86] Not taking advantage of the permute-in-lane
instruction (vpermilps)
Product: libraries
Version: trunk
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedb...@nondot.org
Reporter: ayman.m...@intel.com
CC: llvm-bugs@lists.llvm.org
define <16 x float> @test_16xfloat_perm_mask0(<16 x float> %vec) {
%res = shufflevector <16 x float> %vec, <16 x float> undef, <16 x i32> <i32
1, i32 1, i32 3, i32 0, i32 6, i32 4, i32 5, i32 7, i32 8, i32 8, i32 9, i32 9,
i32 15, i32 14, i32 14, i32 12>
ret <16 x float> %res
}
LLVM emits (showing 2.86 throughput on IACA tool):
vmovaps .LCPI142_0(%rip), %zmm1 # zmm1 =
[1,1,3,0,6,4,5,7,8,8,9,9,15,14,14,12]
vpermps %zmm0, %zmm1, %zmm0
While it could have emitted (showing 1.00 throughput on IACA tool):
vpermilps .LCPI142_0(%rip), %zmm0, %zmm0
* LCPI142_0 holds the needed indexes for each permute instruction.
** Throughput results from IACA tool => lower is better.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs