On Sat, 2023-06-24 at 15:15 +0200, Johannes Berg wrote: > On Fri, 2023-06-23 at 16:34 -0600, Rob Herring wrote: > > > > > > > > Either way, the old patchset will give you a good idea about how it all > > > works, the changes are mostly in the details. I am happy to push out a > > > new version sooner rather than later if it might help with any efforts > > > on your side. > > > > From a quick scan, it looks like there's some cleanups in the series > > which would be helpful without seccomp parts. One of the initial > > issues I've found is UML using older ptrace interfaces which arm64 > > doesn't implement. PTRACE_GETREGS for example. > > > > I don't think that completely gets rid of PTRACE_GETREGS though, and if > I remember correctly, we really kind of need that there?
The SECCOMP code should not need any ptrace at all. All it does is read/write the mcontext that is generated by the host. I think there was just some mangling there to map the basic registers into the format that UML expects internally (floating point, SSE, etc. are just copied directly though). > Though then again it's all been a while, and I only faulted the seccomp > mode back in in discussions with Benjamin. Looks like we've found a > potentially nicer way to make it secure than his secret-based approach, > and in fact in a way that should even make it SMP-safe, at least in > theory, obviously a lot of infrastructure is missing to make it SMP in > the first place. Yeah, the idea with FD passing and CLONE_VM without CLONE_FILES does indeed seem very promising both for a secure SECCOMP model and SMP support specifically. It is should be much easier to implement than my previous secret based syscall authentication idea. Benjamin > > Currently, UML has a host process per VMA. Obviously, you need multiple > host processes for SMP (to get SMP), i.e. one per (used) CPU per VMA, > with CLONE_VM. > > The problem with the secrets-based approach here for SMP is that the > secret will be readable to the other running in the VMA (1) and then can > be used for circumventing the protection by jumping into the stub area > and calling host syscalls, see > https://patchwork.ozlabs.org/project/linux-um/patch/20221122100759.208290-28-benja...@sipsolutions.net/ > and > https://patchwork.ozlabs.org/project/linux-um/patch/20221122100759.208290-24-benja...@sipsolutions.net/ > > > Now the new idea we came up with is this: We can make the per-CPU VMA > with CLONE_VM but *not* CLONE_FILES. Then, in the stub, when we need to > execute some real host syscalls on behalf of the child, we > > * send the FD over in a message > * use the FD for mmap (and also always use mmap instead of mprotect) > * close the FD > > Without CLONE_FILES, another thread cannot "steal" the (real, host) FD, > it's useless in the other thread. Note that I'm talking about host FDs > here, in-UML FDs are just numbers and it works all differently, I'm just > talking about executing host syscalls inside the VMA, to set up the VMA > correctly etc. > > The BPF program now allows any the relevant syscalls (2) inside the stub > area, but since you don't have the FD unless you actually executed the > recvmsg() call at the beginning of the stub you can't do anything with > that by jumping into the stub. > > There are some other details involved such as having to split the stub > data into a read-only "what to execute" (3) and a writeable "results" > page, but those are reasonably easy to deal with. > > At the end of the stub, of course the FD must be closed. This is fine > though since mappings persist after the FD was closed. On the next page > fault we have to do it all again, but yeah, page faults were always > super expensive in UML ... > > johannes > > > (1) unless you extract it directly out of the BPF program into registers > via some magic syscalls that get a BPF return, but that's kind of icky > too > > (2) mmap & munmap are the most relevant ones, some others but they're > not that critical for security; notably mprotect is not allowed but must > be done with mmap instead > > (3) so another thread can't actually overwrite the instructions of what > the kernel wants to run inside the VMA process from another thread while > it's happening > _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um