[PATCH] D70157: Align branches within 32-Byte boundary

Kan Shengchen via Phabricator via cfe-commits Thu, 05 Dec 2019 08:03:08 -0800

skan marked 3 inline comments as done.
skan added inline comments.


================
Comment at: llvm/test/MC/X86/x86-64-align-branch-1b.s:10
+# CHECK: 0000000000000000 foo:
+# CHECK-NEXT:        0: 64 89 04 25 01 00 00 00          movl    %eax, %fs:1
+# CHECK-NEXT:        8: 2e 55                            pushq   %rbp
----------------
MaskRay wrote:
> I think 1a.s and 1b.s should be merged. FileCheck supports
> 
> --check-prefixes=CHECK,PREFIX5
> --check-prefixes=CHECK,PREFIX1
> 
> ```
> CHECK: common part
> CHECK-NEXT: common part
> PREFIX5:
> PREFIX5-NEXT:
> PREFIX1:
> PREFIX1-NEXT:
> CHECK:
> CHECK-NEXT:
> ```
> 
> ```
> % diff -U1 x86-64-align-branch-1[ab].s
>  # CHECK: 0000000000000000 foo:
> -# CHECK-NEXT:        0: 64 64 64 64 89 04 25 01 00 00 00 movl    %eax, %fs:1
> -# CHECK-NEXT:        b: 55                               pushq   %rbp
> -# CHECK-NEXT:        c: 55                               pushq   %rbp
> -# CHECK-NEXT:        d: 55                               pushq   %rbp
> +# CHECK-NEXT:        0: 64 89 04 25 01 00 00 00          movl    %eax, %fs:1
> +# CHECK-NEXT:        8: 2e 55                            pushq   %rbp
> +# CHECK-NEXT:        a: 2e 55                            pushq   %rbp
> +# CHECK-NEXT:        c: 2e 55                            pushq   %rbp
>  # CHECK-NEXT:        e: 48 89 e5                         movq    %rsp, %rbp
> ```
> 
> Is there performance benefit to add 4 prefixes to the same instruction?
Thanks for the knowledge!  There is no performance benefit to add 4 prefixes to 
the same instruction. The  instruction `movl    %eax, %fs:1` has already has a 
prefix %fs (0x64),  if option `-x86-align-branch-prefix-size=5` is used, we 
could add 4 more prefixes at most. The branch to be aligned needs 3-byte 
padding, so 3 prefixes are added to the move instruction.  


================
Comment at: llvm/test/MC/X86/x86-64-align-branch-1c.s:2
+# Check only fused conditional jumps and conditional jumps are aligned with 
option --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc  
--x86-align-branch-prefix-size=5
+# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown 
--x86-align-branch-boundary=32 --x86-align-branch=fused+jcc  
--x86-align-branch-prefix-size=5 %s | llvm-objdump -d  - | FileCheck %s
+
----------------
MaskRay wrote:
> The difference between 1a and 1c is that 1c does not allow "jmp", but in 1a 
> no jmp instructions get a prefix in the test, so it is unclear why 1c has 
> different output.
```
# CHECK-NEXT:       45: 2e 89 45 fc                      movl    %eax, 
%cs:-4(%rbp)
# CHECK-NEXT:       49: 89 75 f4                         movl    %esi, -12(%rbp)
# CHECK-NEXT:       4c: 89 7d f8                         movl    %edi, -8(%rbp)
# CHECK-NEXT:       4f: 89 75 f4                         movl    %esi, -12(%rbp)
# CHECK-NEXT:       52: 89 75 f4                         movl    %esi, -12(%rbp)
# CHECK-NEXT:       55: 89 75 f4                         movl    %esi, -12(%rbp)
# CHECK-NEXT:       58: 89 75 f4                         movl    %esi, -12(%rbp)
# CHECK-NEXT:       5b: 89 75 f4                         movl    %esi, -12(%rbp)
# CHECK-NEXT:       5e: 5d                               popq    %rbp
# CHECK-NEXT:       5f: 5d                               popq    %rbp
# CHECK-NEXT:       60: eb 26                            jmp     {{.*}}
```

The prefix 2e is added at

```
 45: 2e 89 45 fc                      movl    %eax, %cs:-4(%rbp)
```


================
Comment at: llvm/test/MC/X86/x86-64-align-branch-1e.s:46
+# CHECK-NEXT:       5c: eb 27                            jmp     {{.*}}
+# CHECK-NEXT:       5e: 90                               nop
+# CHECK-NEXT:       5f: 90                               nop
----------------
MaskRay wrote:
> This is weird. Comparing this with 1d, 1e allows more instruction types, yet 
> it inserts two NOPs which actually seems to degrade performance.
Not all the target cpu support long nop, you can find related code in 
X86AsmBackend::writeNopData.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70157: Align branches within 32-Byte boundary

Reply via email to