https://sourceware.org/bugzilla/show_bug.cgi?id=31894
Bug ID: 31894
Summary: Bundle padding generates inefficient nops
Product: binutils
Version: 2.43 (HEAD)
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: gas
Assignee: unassigned at sourceware dot org
Reporter: zyedidia at cs dot stanford.edu
Target Milestone: ---
Created attachment 15584
--> https://sourceware.org/bugzilla/attachment.cgi?id=15584&action=edit
Patch that fixes the issue
The use of the bundle alignment directive `.bundle_align_mode` should cause
nops to be inserted into the program as padding, but these nops are inserted as
single-byte nop instructions on x86-64, rather than more efficient multi-byte
nops.
For example:
.bundle_align_mode 4
movl %ebx,%edx
movq %r12, %rsi
movq %rbp, %rdi
shrl $0x8, %edx
imull $0xffffff00, %edx, %edx
assembles to
0: 89 da mov %ebx,%edx
2: 4c 89 e6 mov %r12,%rsi
5: 48 89 ef mov %rbp,%rdi
8: c1 ea 08 shr $0x8,%edx
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 69 d2 00 ff ff ff imul $0xffffff00,%edx,%edx
It would be better to generate:
0: 89 da mov %ebx,%edx
2: 4c 89 e6 mov %r12,%rsi
5: 48 89 ef mov %rbp,%rdi
8: c1 ea 08 shr $0x8,%edx
b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
10: 69 d2 00 ff ff ff imul $0xffffff00,%edx,%edx
I have implemented a fix and benchmarked it on SPEC 2017 with a compiler that
uses bundle alignment, and the performance improvement is noticeable (~5%
improvement). I have attached the fix as a patch. This new behavior is also
consistent with LLVM, which uses multi-byte nops for bundle padding.
--
You are receiving this mail because:
You are on the CC list for the bug.