https://sourceware.org/bugzilla/show_bug.cgi?id=31894
Bug ID: 31894 Summary: Bundle padding generates inefficient nops Product: binutils Version: 2.43 (HEAD) Status: UNCONFIRMED Severity: normal Priority: P2 Component: gas Assignee: unassigned at sourceware dot org Reporter: zyedidia at cs dot stanford.edu Target Milestone: --- Created attachment 15584 --> https://sourceware.org/bugzilla/attachment.cgi?id=15584&action=edit Patch that fixes the issue The use of the bundle alignment directive `.bundle_align_mode` should cause nops to be inserted into the program as padding, but these nops are inserted as single-byte nop instructions on x86-64, rather than more efficient multi-byte nops. For example: .bundle_align_mode 4 movl %ebx,%edx movq %r12, %rsi movq %rbp, %rdi shrl $0x8, %edx imull $0xffffff00, %edx, %edx assembles to 0: 89 da mov %ebx,%edx 2: 4c 89 e6 mov %r12,%rsi 5: 48 89 ef mov %rbp,%rdi 8: c1 ea 08 shr $0x8,%edx b: 90 nop c: 90 nop d: 90 nop e: 90 nop f: 90 nop 10: 69 d2 00 ff ff ff imul $0xffffff00,%edx,%edx It would be better to generate: 0: 89 da mov %ebx,%edx 2: 4c 89 e6 mov %r12,%rsi 5: 48 89 ef mov %rbp,%rdi 8: c1 ea 08 shr $0x8,%edx b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 10: 69 d2 00 ff ff ff imul $0xffffff00,%edx,%edx I have implemented a fix and benchmarked it on SPEC 2017 with a compiler that uses bundle alignment, and the performance improvement is noticeable (~5% improvement). I have attached the fix as a patch. This new behavior is also consistent with LLVM, which uses multi-byte nops for bundle padding. -- You are receiving this mail because: You are on the CC list for the bug.