On Mon, Mar 18, 2019 at 2:41 AM Elena Reshetova <elena.reshet...@intel.com> wrote: > > If CONFIG_RANDOMIZE_KSTACK_OFFSET is selected, > the kernel stack offset is randomized upon each > entry to a system call after fixed location of pt_regs > struct. > > This feature is based on the original idea from > the PaX's RANDKSTACK feature: > https://pax.grsecurity.net/docs/randkstack.txt > All the credits for the original idea goes to the PaX team. > However, the design and implementation of > RANDOMIZE_KSTACK_OFFSET differs greatly from the RANDKSTACK > feature (see below). > > Reasoning for the feature: > > This feature aims to make considerably harder various > stack-based attacks that rely on deterministic stack > structure. > We have had many of such attacks in past [1],[2],[3] > (just to name few), and as Linux kernel stack protections > have been constantly improving (vmap-based stack > allocation with guard pages, removal of thread_info, > STACKLEAK), attackers have to find new ways for their > exploits to work. > > It is important to note that we currently cannot show > a concrete attack that would be stopped by this new > feature (given that other existing stack protections > are enabled), so this is an attempt to be on a proactive > side vs. catching up with existing successful exploits. > > The main idea is that since the stack offset is > randomized upon each system call, it is very hard for > attacker to reliably land in any particular place on > the thread stack when attack is performed. > Also, since randomization is performed *after* pt_regs, > the ptrace-based approach to discover randomization > offset during a long-running syscall should not be > possible. > > [1] jon.oberheide.org/files/infiltrate12-thestackisback.pdf > [2] jon.oberheide.org/files/stackjacking-infiltrate11.pdf > [3] googleprojectzero.blogspot.com/2016/06/exploiting- > recursion-in-linux-kernel_20.html > > Design description: > > During most of the kernel's execution, it runs on the "thread > stack", which is allocated at fork.c/dup_task_struct() and stored in > a per-task variable (tsk->stack). Since stack is growing downward, > the stack top can be always calculated using task_top_of_stack(tsk) > function, which essentially returns an address of tsk->stack + stack > size. When VMAP_STACK is enabled, the thread stack is allocated from > vmalloc space. > > Thread stack is pretty deterministic on its structure - fixed in size, > and upon every entry from a userspace to kernel on a > syscall the thread stack is started to be constructed from an > address fetched from a per-cpu cpu_current_top_of_stack variable. > The first element to be pushed to the thread stack is the pt_regs struct > that stores all required CPU registers and sys call parameters. > > The goal of RANDOMIZE_KSTACK_OFFSET feature is to add a random offset > after the pt_regs has been pushed to the stack and the rest of thread > stack (used during the syscall processing) every time a process issues > a syscall. The source of randomness can be taken either from rdtsc or > rdrand with performance implications listed below. The value of random > offset is stored in a callee-saved register (r15 currently) and the > maximum size of random offset is defined by __MAX_STACK_RANDOM_OFFSET > value, which currently equals to 0xFF0. > > As a result this patch introduces 8 bits of randomness > (bits 4 - 11 are randomized, bits 0-3 must be zero due to stack alignment) > after pt_regs location on the thread stack. > The amount of randomness can be adjusted based on how much of the > stack space we wish/can trade for security.
Why do you need four zero bits at the bottom? x86_64 Linux only maintains 8 byte stack alignment. > > The main issue with this approach is that it slightly breaks the > processing of last frame in the unwinder, so I have made a simple > fix to the frame pointer unwinder (I guess others should be fixed > similarly) and stack dump functionality to "jump" over the random hole > at the end. My way of solving this is probably far from ideal, > so I would really appreciate feedback on how to improve it. That's probably a question for Josh :) Another way to do the dirty work would be to do: char *ptr = alloca(offset); asm volatile ("" :: "m" (*ptr)); in do_syscall_64() and adjust compiler flags as needed to avoid warnings. Hmm. > > Performance: > > 1) lmbench: ./lat_syscall -N 1000000 null > base: Simple syscall: 0.1774 microseconds > random_offset (rdtsc): Simple syscall: 0.1803 microseconds > random_offset (rdrand): Simple syscall: 0.3702 microseconds > > 2) Andy's tests, misc-tests: ./timing_test_64 10M sys_enosys > base: 10000000 loops in 1.62224s = 162.22 nsec / loop > random_offset (rdtsc): 10000000 loops in 1.64660s = 164.66 nsec / loop > random_offset (rdrand): 10000000 loops in 3.51315s = 351.32 nsec / loop > Egads! RDTSC is nice and fast but probably fairly easy to defeat. RDRAND is awful. I had hoped for better. So perhaps we need a little percpu buffer that collects 64 bits of randomness at a time, shifts out the needed bits, and refills the buffer when we run out. > /* > * This does 'call enter_from_user_mode' unless we can avoid it based on > * kernel config or using the static jump infrastructure. > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index 1f0efdb7b629..0816ec680c21 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -167,13 +167,19 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) > > PUSH_AND_CLEAR_REGS rax=$-ENOSYS > > + RANDOMIZE_KSTACK /* stores randomized offset in r15 */ > + > TRACE_IRQS_OFF > > /* IRQs are off. */ > movq %rax, %rdi > movq %rsp, %rsi > + sub %r15, %rsp /* substitute random offset from rsp */ > call do_syscall_64 /* returns with IRQs disabled */ > > + /* need to restore the gap */ > + add %r15, %rsp /* add random offset back to rsp */ Off the top of my head, the nicer way to approach this would be to change this such that mov %rbp, %rsp; popq %rbp or something like that will do the trick. Then the unwinder could just see it as a regular frame. Maybe Josh will have a better idea.