A few weeks ago a conversation about retguard (a diff is probably
coming) caused me re-consider & re-read the BROP paper

        https://www.scs.stanford.edu/brop/bittau-brop.pdf

After lots of details, page 8 has a table summarizing the attack process.

Step 5 contains the text "The attacker can now dump the entire binary to
find more gadgets".

This diff hinders that step.  It prevents download of immutable text
segments via system calls as a simple step.  This is valuable because
BROP needs this to be a simple step, because the write is the maximum
tooling the attacker has at the moment.

There is a difficulty in the way of "oh, just make code segments
non-readable".  Most MMU lack the ability to manage PROT_EXEC without
PROT_READ.  Being able to read the code using data instructions is
implicit in these architectures.  There is a very short list of
architectures and sub-architectures that could block read to code
regions if we wrote uvm/pmap code.  We really want this property for
security reasons, but since most MMU lack it we have not made a lot of
progress.  This illusive property, of block reads to code, is called
"X-only" in our group.

So that means user processe can read their own code.  Can't stop that.

The kernel also reads userland memory, when you do a system call like
write() or sendto().  It does this using copyin() against the userland
region, which again uses the MMU.  But the MMU lacks the ability we
desire. Changing this lookup to instead use higher-level virtual-memory
representation data structure inspection would introduce either races or
some pretty strong locks because the virtual memory layout can be
changed by threads, and therefore we would hurt threading performance.

So we cannot simply inspect the whole virtual-memory data structures
for the region.

So created a very small coherent cache of unpermitted regions which gets
looked up before copyin(), and after a few iterations of coding, I
managed to do it without locking!

Depending on binary type, this cache is an array of 2 to 4 text regions:
main program text, ld.so text, signal trampoline, and libc.so text.
Normally this would need management & updates when processes do
mprotect/mmap/munmap, but a few weeks ago I introduced mimmutable(), and
since all those text segements now immutable we know they cannot change!
So there is no need to update or delete entries in this cache.  Once we
know this table is complete, we don't need to lock lookups.

The kernel now prevents "write(fd, &main, length)" and "write(fd, &open,
length)" by returning -1 EFAULT.

This protection is not made available to other libraries (like
libcrypto).  I've looked into doing this via a special system call, and
via an implicit arrangement inside mimmutable(), but the changes are
much more complicated and the security benefit is lower (so for now, I am
going to punt on that).

Someone is going to reply "but I'll copy libc text to other memory
before I do the write operation!"  Please show us at which step in the
BROP procedure on page 8 this copy operation is done, and how. BROP is
used when the attack tooling is insufficient for complicated sequences
like "copy elsewhere + write"; BROP is a method to collect powerful
gadgetry that you don't have for a next-round attack sequence.  In
particular, BROP would be used to learn the libc random relink, and in
particular to discover the location of all syscall stubs.


There are are unfinished pieces in here, but it seems to be working fine.
The next step will be to find out if we have any software in ports which
tries to perform output system calls with their text segments as data origin.

Let me know.

Index: sys/kern/exec_elf.c
===================================================================
RCS file: /cvs/src/sys/kern/exec_elf.c,v
retrieving revision 1.177
diff -u -p -u -r1.177 exec_elf.c
--- sys/kern/exec_elf.c 5 Dec 2022 23:18:37 -0000       1.177
+++ sys/kern/exec_elf.c 20 Dec 2022 07:10:34 -0000
@@ -621,9 +621,11 @@ exec_elf_makecmds(struct proc *p, struct
                        } else
                                addr = ELF_NO_ADDR;
 
-                       /* Permit system calls in specific main-programs */
+                       /*
+                        * Permit system calls in main-text static binaries.
+                        * Also block the ld.so syscall-grant
+                        */
                        if (interp == NULL) {
-                               /* statics. Also block the ld.so syscall-grant 
*/
                                syscall = VMCMD_SYSCALL;
                                p->p_vmspace->vm_map.flags |= 
VM_MAP_SYSCALL_ONCE;
                        }
Index: sys/kern/exec_subr.c
===================================================================
RCS file: /cvs/src/sys/kern/exec_subr.c,v
retrieving revision 1.64
diff -u -p -u -r1.64 exec_subr.c
--- sys/kern/exec_subr.c        5 Dec 2022 23:18:37 -0000       1.64
+++ sys/kern/exec_subr.c        20 Dec 2022 06:39:40 -0000
@@ -215,6 +215,10 @@ vmcmd_map_pagedvn(struct proc *p, struct
                if (cmd->ev_flags & VMCMD_IMMUTABLE)
                        uvm_map_immutable(&p->p_vmspace->vm_map, cmd->ev_addr,
                            round_page(cmd->ev_addr + cmd->ev_len), 1);
+               if ((flags & UVM_FLAG_SYSCALL) ||
+                   ((cmd->ev_flags & VMCMD_IMMUTABLE) && (cmd->ev_prot & 
PROT_EXEC)))
+                       uvm_map_xonly(&p->p_vmspace->vm_map,
+                           cmd->ev_addr, round_page(cmd->ev_addr + 
cmd->ev_len));
        }
 
        return (error);
Index: sys/kern/kern_sig.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.301
diff -u -p -u -r1.301 kern_sig.c
--- sys/kern/kern_sig.c 16 Oct 2022 16:27:02 -0000      1.301
+++ sys/kern/kern_sig.c 19 Dec 2022 18:17:28 -0000
@@ -1642,6 +1642,9 @@ coredump(struct proc *p)
 
        atomic_setbits_int(&pr->ps_flags, PS_COREDUMP);
 
+       /* disable xonly checks, so we can write out text sections if needed */
+       p->p_vmspace->vm_map.xonly_count = 0;
+
        /* Don't dump if will exceed file size limit. */
        if (USPACE + ptoa(vm->vm_dsize + vm->vm_ssize) >= lim_cur(RLIMIT_CORE))
                return (EFBIG);
Index: sys/kern/kern_subr.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_subr.c,v
retrieving revision 1.51
diff -u -p -u -r1.51 kern_subr.c
--- sys/kern/kern_subr.c        14 Aug 2022 01:58:27 -0000      1.51
+++ sys/kern/kern_subr.c        20 Dec 2022 01:29:43 -0000
@@ -43,6 +43,8 @@
 #include <sys/sched.h>
 #include <sys/malloc.h>
 #include <sys/queue.h>
+#include <uvm/uvm.h>
+#include <uvm/uvm_map.h>
 
 int
 uiomove(void *cp, size_t n, struct uio *uio)
@@ -78,8 +80,12 @@ uiomove(void *cp, size_t n, struct uio *
                        sched_pause(preempt);
                        if (uio->uio_rw == UIO_READ)
                                error = copyout(cp, iov->iov_base, cnt);
-                       else
+                       else {
+                               if (uvm_map_xonly_check(uio->uio_procp,
+                                   (vaddr_t)iov->iov_base, cnt))
+                                       return EFAULT;
                                error = copyin(iov->iov_base, cp, cnt);
+                       }
                        if (error)
                                return (error);
                        break;
Index: sys/kern/subr_log.c
===================================================================
RCS file: /cvs/src/sys/kern/subr_log.c,v
retrieving revision 1.75
diff -u -p -u -r1.75 subr_log.c
--- sys/kern/subr_log.c 2 Jul 2022 08:50:42 -0000       1.75
+++ sys/kern/subr_log.c 20 Dec 2022 01:26:45 -0000
@@ -644,6 +644,8 @@ dosendsyslog(struct proc *p, const char 
                 */
                len = MIN(nbyte, sizeof(pri));
                if (sflg == UIO_USERSPACE) {
+//                     if (uvm_map_xonly_check(p, buf, len))
+//                             return (EFAULT);
                        if ((error = copyin(buf, pri, len)))
                                return (error);
                } else
Index: sys/uvm/uvm_io.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_io.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 uvm_io.c
--- sys/uvm/uvm_io.c    7 Oct 2022 14:59:39 -0000       1.30
+++ sys/uvm/uvm_io.c    19 Dec 2022 18:10:20 -0000
@@ -57,7 +57,7 @@ uvm_io(vm_map_t map, struct uio *uio, in
        vsize_t chunksz, togo, sz;
        struct uvm_map_deadq dead_entries;
        int error, extractflags;
-
+       int save_xonly_count;
        /*
         * step 0: sanity checks and set up for copy loop.  start with a
         * large chunk size.  if we have trouble finding vm space we will
@@ -84,8 +84,12 @@ uvm_io(vm_map_t map, struct uio *uio, in
        error = 0;
 
        extractflags = 0;
-       if (flags & UVM_IO_FIXPROT)
+       if (flags & UVM_IO_FIXPROT) {
                extractflags |= UVM_EXTRACT_FIXPROT;
+               /* Disable xonly checks on this map */
+               save_xonly_count = map->xonly_count;
+               map->xonly_count = 0;
+       }
 
        /*
         * step 1: main loop...  while we've got data to move
@@ -134,6 +138,10 @@ uvm_io(vm_map_t map, struct uio *uio, in
                if (error)
                        break;
        }
+
+       /* Restore xonly checks on this map */
+       if (flags & UVM_IO_FIXPROT)
+               map->xonly_count = save_xonly_count;
 
        return (error);
 }
Index: sys/uvm/uvm_map.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_map.c,v
retrieving revision 1.305
diff -u -p -u -r1.305 uvm_map.c
--- sys/uvm/uvm_map.c   18 Dec 2022 23:41:17 -0000      1.305
+++ sys/uvm/uvm_map.c   20 Dec 2022 07:13:07 -0000
@@ -3472,6 +3472,7 @@ uvmspace_exec(struct proc *p, vaddr_t st
 
                uvmspace_free(ovm);
        }
+       p->p_vmspace->vm_map.xonly_count = 0;
 
        /* Release dead entries */
        uvm_unmap_detach(&dead_entries, 0);
@@ -4258,8 +4259,71 @@ uvm_map_syscall(struct vm_map *map, vadd
                entry = RBT_NEXT(uvm_map_addr, entry);
        }
 
+       /* Add libc's text segment to the XONLY list */
+       if (map->xonly_count < UVM_MAP_XONLY_MAX) {
+               //printf("%d xsysc %lx-%lx\n", map->xonly_count, start, end);
+               map->xonly[map->xonly_count].start = start;
+               map->xonly[map->xonly_count].end = end;
+               map->xonly_count++;
+       }
+
        map->wserial++;
        map->flags |= VM_MAP_SYSCALL_ONCE;
+       vm_map_unlock(map);
+       return (0);
+}
+
+/*
+ * uvm_map_xonly_check: if the address is in an x-only region, return EFAULT
+ */
+int
+uvm_map_xonly_check(struct proc *p, vaddr_t start, vsize_t len)
+{
+       struct vm_map *map = &p->p_vmspace->vm_map;
+       vaddr_t end = start + len;
+       int i, r = 0;
+
+       /*
+        * When system calls are registered and msyscall(2) is blocked,
+        * there are no new calls to setup xonly regions
+        */
+       if ((map->flags & VM_MAP_SYSCALL_ONCE) == 0)
+               vm_map_lock(map);
+       for (i = 0; i < map->xonly_count; i++) {
+               vaddr_t s = map->xonly[i].start, e = map->xonly[i].end;
+
+               if ((start >= s && start < e) || (end >= s && end < e)) {
+                       r = EFAULT;
+                       break;
+               }
+       }
+       if ((map->flags & VM_MAP_SYSCALL_ONCE) == 0)
+               vm_map_unlock(map);
+       return (r);
+}
+
+/* 
+ * uvm_map_xonly: remember regions which are X-only for uiomove()
+ *
+ * => map must be unlocked
+ */
+int
+uvm_map_xonly(struct vm_map *map, vaddr_t start, vaddr_t end)
+{
+       if (start > end)
+               return EINVAL;
+       start = MAX(start, map->min_offset);
+       end = MIN(end, map->max_offset);
+       if (start >= end)
+               return 0;
+
+       vm_map_lock(map);
+       if (map->xonly_count < UVM_MAP_XONLY_MAX) {
+               //printf("%d xonly %lx-%lx\n", map->xonly_count, start, end);
+               map->xonly[map->xonly_count].start = start;
+               map->xonly[map->xonly_count].end = end;
+               map->xonly_count++;             
+       }
        vm_map_unlock(map);
        return (0);
 }
Index: sys/uvm/uvm_map.h
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_map.h,v
retrieving revision 1.81
diff -u -p -u -r1.81 uvm_map.h
--- sys/uvm/uvm_map.h   17 Nov 2022 23:26:07 -0000      1.81
+++ sys/uvm/uvm_map.h   20 Dec 2022 05:23:43 -0000
@@ -168,6 +168,12 @@ struct vm_map_entry {
        vsize_t                 fspace_augment; /* max(fspace) in subtree */
 };
 
+struct uvm_xonly {
+       vaddr_t                 start;
+       vaddr_t                 end;
+};
+#define UVM_MAP_XONLY_MAX      10
+
 #define        VM_MAPENT_ISWIRED(entry)        ((entry)->wired_count != 0)
 
 TAILQ_HEAD(uvm_map_deadq, vm_map_entry);       /* dead entry queue */
@@ -309,6 +315,9 @@ struct vm_map {
        struct uvm_addr_state   *uaddr_any[4];  /* More selectors. */
        struct uvm_addr_state   *uaddr_brk_stack; /* Brk/stack selector. */
 
+       struct uvm_xonly        xonly[UVM_MAP_XONLY_MAX];
+       int                     xonly_count;
+
        /*
         * XXX struct mutex changes size because of compile options, so place
         * place after fields which are inspected by libkvm / procmap(8)
@@ -354,6 +363,8 @@ int         uvm_map_extract(struct vm_map *, va
 struct vm_map *        uvm_map_create(pmap_t, vaddr_t, vaddr_t, int);
 vaddr_t                uvm_map_pie(vaddr_t);
 vaddr_t                uvm_map_hint(struct vmspace *, vm_prot_t, vaddr_t, 
vaddr_t);
+int            uvm_map_xonly(struct vm_map *, vaddr_t, vaddr_t);
+int            uvm_map_xonly_check(struct proc *, vaddr_t, vsize_t);
 int            uvm_map_syscall(struct vm_map *, vaddr_t, vaddr_t);
 int            uvm_map_immutable(struct vm_map *, vaddr_t, vaddr_t, int);
 int            uvm_map_inherit(struct vm_map *, vaddr_t, vaddr_t, 
vm_inherit_t);

Reply via email to