[Bug gas/31894] New: Bundle padding generates inefficient nops

zyedidia at cs dot stanford.edu Thu, 13 Jun 2024 13:36:43 -0700

https://sourceware.org/bugzilla/show_bug.cgi?id=31894


            Bug ID: 31894
           Summary: Bundle padding generates inefficient nops
           Product: binutils
           Version: 2.43 (HEAD)
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: gas
          Assignee: unassigned at sourceware dot org
          Reporter: zyedidia at cs dot stanford.edu
  Target Milestone: ---

Created attachment 15584
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15584&action=edit
Patch that fixes the issue

The use of the bundle alignment directive `.bundle_align_mode` should cause
nops to be inserted into the program as padding, but these nops are inserted as
single-byte nop instructions on x86-64, rather than more efficient multi-byte
nops.

For example:


.bundle_align_mode 4
movl %ebx,%edx
movq %r12, %rsi
movq %rbp, %rdi
shrl $0x8, %edx
imull $0xffffff00, %edx, %edx


assembles to


   0:   89 da                   mov    %ebx,%edx
   2:   4c 89 e6                mov    %r12,%rsi
   5:   48 89 ef                mov    %rbp,%rdi
   8:   c1 ea 08                shr    $0x8,%edx
   b:   90                      nop
   c:   90                      nop
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop
  10:   69 d2 00 ff ff ff       imul   $0xffffff00,%edx,%edx


It would be better to generate:


   0:   89 da                   mov    %ebx,%edx
   2:   4c 89 e6                mov    %r12,%rsi
   5:   48 89 ef                mov    %rbp,%rdi
   8:   c1 ea 08                shr    $0x8,%edx
   b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  10:   69 d2 00 ff ff ff       imul   $0xffffff00,%edx,%edx


I have implemented a fix and benchmarked it on SPEC 2017 with a compiler that
uses bundle alignment, and the performance improvement is noticeable (~5%
improvement). I have attached the fix as a patch. This new behavior is also
consistent with LLVM, which uses multi-byte nops for bundle padding.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug gas/31894] New: Bundle padding generates inefficient nops

Reply via email to