https://bugs.llvm.org/show_bug.cgi?id=51288
Bug ID: 51288
Summary: Convert mov and shr to shrx in loops constrained by
retirement rate
Product: new-bugs
Version: 12.0
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedb...@nondot.org
Reporter: t...@lipcon.org
CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org
This input file:
#include <stdint.h>
#include <utility>
struct Foo {
uint64_t v;
std::pair<uint32_t, uint32_t> Get() { return {v & 0xffffffff, v >> 32}; }
};
void Process(Foo* f, uint32_t* dst, int n) {
#pragma unroll
for (int i = 0; i < n; i++) {
auto [mask, idx] = f[i].Get();
dst[idx] |= mask;
}
}
Generates some assembly where the core of the loop has the following sequence:
movq 24(%rdi,%rax,8), %r9
movq %r9, %rcx
shrq $32, %rcx
orl %r9d, (%rsi,%rcx,4)
When compiling with bmi2 support, it would instead be slightly faster to store
the constant 32 into a register and use shrx to combine the copy of %r9 into
%rcx with a shift.
Generated version:
https://bit.ly/2WzH8Pj
Preferred version (~saving half a cycle per unrolled-by-4 loop):
https://bit.ly/3jaXBBh
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs