2025-03-14T14:39:29-07:00, Deepak Gupta <de...@rivosinc.com>:
> As discussed extensively in the changelog for the addition of this
> syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
> existing mmap() and madvise() syscalls do not map entirely well onto the
> security requirements for shadow stack memory since they lead to windows
> where memory is allocated but not yet protected or stacks which are not
> properly and safely initialised. Instead a new syscall map_shadow_stack()
> has been defined which allocates and initialises a shadow stack page.
>
> This patch implements this syscall for riscv. riscv doesn't require token
> to be setup by kernel because user mode can do that by itself. However to
> provide compatibility and portability with other architectues, user mode
> can specify token set flag.

RISC-V shadow stack could use mmap() and madvise() perfectly well.
Userspace can always initialize the shadow stack properly and the shadow
stack memory is never protected from other malicious threads.

I think that the compatibility argument is reasonable.  We'd need to
modify the other syscalls to allow a write-only mapping anyway.

> diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
> +static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned 
> long val)
> +{
> +     /*
> +      * Never expect -1 on shadow stack. Expect return addresses and zero
> +      */
> +     unsigned long swap = -1;
> +     __enable_user_access();
> +     asm goto(
> +             ".option push\n"
> +             ".option arch, +zicfiss\n"

Shouldn't compiler accept ssamoswap.d opcode even without zicfiss arch?

> +             "1: ssamoswap.d %[swap], %[val], %[addr]\n"
> +             _ASM_EXTABLE(1b, %l[fault])
> +             RISCV_ACQUIRE_BARRIER

Why is the barrier here?

> +             ".option pop\n"
> +             : [swap] "=r" (swap), [addr] "+A" (*addr)
> +             : [val] "r" (val)
> +             : "memory"
> +             : fault
> +             );
> +     __disable_user_access();
> +     return swap;
> +fault:
> +     __disable_user_access();
> +     return -1;

I think we should return 0 and -EFAULT.
We can ignore the swapped value, or return it through a pointer.

> +}
> +
> +static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long 
> size,
> +                                        unsigned long token_offset, bool 
> set_tok)
> +{
> +     int flags = MAP_ANONYMOUS | MAP_PRIVATE;

Is MAP_GROWSDOWN pointless?

> +     struct mm_struct *mm = current->mm;
> +     unsigned long populate, tok_loc = 0;
> +
> +     if (addr)
> +             flags |= MAP_FIXED_NOREPLACE;
> +
> +     mmap_write_lock(mm);
> +     addr = do_mmap(NULL, addr, size, PROT_READ, flags,

PROT_READ implies VM_READ, so won't this select PAGE_COPY in the
protection_map instead of PAGE_SHADOWSTACK?

Wouldn't avoiding VM_READ also allow us to get rid of the ugly hack in
pte_mkwrite?  (VM_WRITE would naturally select the right XWR flags.)

> +                    VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL);
> +     mmap_write_unlock(mm);
> +
> +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, 
> unsigned int, flags)
> +{
> [...]
> +     if (addr && (addr & (PAGE_SIZE - 1)))

if (!PAGE_ALIGNED(addr))

Reply via email to