https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
--- Comment #22 from Andrew Cooper <andrew.cooper3 at citrix dot com> --- One curious thing I have discovered. While auditing the -mharden-sls=all code generation in Xen, I found examples where I got "ret int3 ret int3" with no intervening instructions. It turns out this is not a regression in this change. It is a pre-existing missing optimisation, which is made more obvious when every ret is extended with an int3. It occurs for functions with either no stack frame at all, or functions which have an early exit before setting up the stack frame. Some examples which occur at -O1 do not occur at -O2. One curious example which does still repro at -O2 is this. We have a hash lookup function: struct context *sidtab_search(struct sidtab *s, u32 sid) { int hvalue; struct sidtab_node *cur; if ( !s ) return NULL; hvalue = SIDTAB_HASH(sid); cur = s->htable[hvalue]; while ( cur != NULL && sid > cur->sid ) cur = cur->next; if ( cur == NULL || sid != cur->sid ) { /* Remap invalid SIDs to the unlabeled SID. */ sid = SECINITSID_UNLABELED; hvalue = SIDTAB_HASH(sid); cur = s->htable[hvalue]; while ( cur != NULL && sid > cur->sid ) cur = cur->next; if ( !cur || sid != cur->sid ) return NULL; } return &cur->context; } which compiles (reformatted a little for width - unmodified: https://paste.debian.net/hidden/7bf675d6/) to: <sidtab_search>: 48 85 ff test %rdi,%rdi /------- 74 63 je <sidtab_search+0x68> | 48 8b 17 mov (%rdi),%rdx | 89 f0 mov %esi,%eax | 83 e0 7f and $0x7f,%eax | 48 8b 04 c2 mov (%rdx,%rax,8),%rax | 48 85 c0 test %rax,%rax | /--- 75 13 jne <sidtab_search+0x29> | /|--- eb 17 jmp <sidtab_search+0x2f> | || 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) | || 00 | ||/-> 48 8b 40 48 mov 0x48(%rax),%rax | ||| 48 85 c0 test %rax,%rax | +||-- 74 06 je <sidtab_search+0x2f> | |\|-> 39 30 cmp %esi,(%rax) | | \-- 72 f3 jb <sidtab_search+0x20> | /|---- 74 24 je <sidtab_search+0x53> | |\---> 48 8b 42 28 mov 0x28(%rdx),%rax | | 48 85 c0 test %rax,%rax | | /--- 75 11 jne <sidtab_search+0x49> |/|-|--- eb 32 jmp <sidtab_search+0x6c> // (1) ||| | 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) ||| |/-> 48 8b 40 48 mov 0x48(%rax),%rax ||| || 48 85 c0 test %rax,%rax |||/||-- 74 17 je <sidtab_search+0x60> // (2) ||||\|-> 83 38 04 cmpl $0x4,(%rax) |||| \-- 76 f2 jbe <sidtab_search+0x40> |||| 83 38 05 cmpl $0x5,(%rax) +|||---- 75 15 jne <sidtab_search+0x68> ||\|---> 48 83 c0 08 add $0x8,%rax || | c3 retq || | cc int3 || | 0f 1f 80 00 00 00 00 nopl 0x0(%rax) || \---> c3 retq // Target of (2) || cc int3 || 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) \|-----> 31 c0 xor %eax,%eax | c3 retq | cc int3 \-----> c3 retq // Target of (1) cc int3 66 90 xchg %ax,%ax There are 4 exits in total. Two have to set up %eax, so they can't usefully be merged. However, the unconditional jmp at (1) is 2 bytes, and could fully contain its target ret;int3 without even impacting the surrounding padding. Whether it inlines or merges, this drops 4 bytes. The conditional jump at (2) could be folded in to any of the other exit paths, dropping 16 bytes from the total size size. I have no idea how easy/hard this may be to track down, or whether it is worth pursuing urgently, but it probably does want looking at, seeing as SLS hardening doubles the hit.