2025-03-14T14:39:29-07:00, Deepak Gupta <de...@rivosinc.com>: > As discussed extensively in the changelog for the addition of this > syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the > existing mmap() and madvise() syscalls do not map entirely well onto the > security requirements for shadow stack memory since they lead to windows > where memory is allocated but not yet protected or stacks which are not > properly and safely initialised. Instead a new syscall map_shadow_stack() > has been defined which allocates and initialises a shadow stack page. > > This patch implements this syscall for riscv. riscv doesn't require token > to be setup by kernel because user mode can do that by itself. However to > provide compatibility and portability with other architectues, user mode > can specify token set flag.
RISC-V shadow stack could use mmap() and madvise() perfectly well. Userspace can always initialize the shadow stack properly and the shadow stack memory is never protected from other malicious threads. I think that the compatibility argument is reasonable. We'd need to modify the other syscalls to allow a write-only mapping anyway. > diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c > +static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned > long val) > +{ > + /* > + * Never expect -1 on shadow stack. Expect return addresses and zero > + */ > + unsigned long swap = -1; > + __enable_user_access(); > + asm goto( > + ".option push\n" > + ".option arch, +zicfiss\n" Shouldn't compiler accept ssamoswap.d opcode even without zicfiss arch? > + "1: ssamoswap.d %[swap], %[val], %[addr]\n" > + _ASM_EXTABLE(1b, %l[fault]) > + RISCV_ACQUIRE_BARRIER Why is the barrier here? > + ".option pop\n" > + : [swap] "=r" (swap), [addr] "+A" (*addr) > + : [val] "r" (val) > + : "memory" > + : fault > + ); > + __disable_user_access(); > + return swap; > +fault: > + __disable_user_access(); > + return -1; I think we should return 0 and -EFAULT. We can ignore the swapped value, or return it through a pointer. > +} > + > +static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long > size, > + unsigned long token_offset, bool > set_tok) > +{ > + int flags = MAP_ANONYMOUS | MAP_PRIVATE; Is MAP_GROWSDOWN pointless? > + struct mm_struct *mm = current->mm; > + unsigned long populate, tok_loc = 0; > + > + if (addr) > + flags |= MAP_FIXED_NOREPLACE; > + > + mmap_write_lock(mm); > + addr = do_mmap(NULL, addr, size, PROT_READ, flags, PROT_READ implies VM_READ, so won't this select PAGE_COPY in the protection_map instead of PAGE_SHADOWSTACK? Wouldn't avoiding VM_READ also allow us to get rid of the ugly hack in pte_mkwrite? (VM_WRITE would naturally select the right XWR flags.) > + VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL); > + mmap_write_unlock(mm); > + > +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, > unsigned int, flags) > +{ > [...] > + if (addr && (addr & (PAGE_SIZE - 1))) if (!PAGE_ALIGNED(addr))