Hello Benjamin,
thank you for your time looking at this. On Sat, 26 Oct 2024 19:19:08 +0900, Benjamin Berg wrote: > > - a crash on userspace programs crashes a UML kernel, not signaling > > with SIGSEGV to the program. > > - commit c27e618 (during v6.12-rc1 merge) introduces invalid access to > > a vma structure for our case, which updates the internal procedure > > of maple_tree subsystem. We're trying to fix issue but still a > > random process on exit(2) crashes. > > Btw. are you handling FP register save/restore? If it is not there, it > probably would not be too hard to add (XSAVE, etc.), though it might > add a bit of additional overhead. Especially as UML always saves the FP > state rather than optimizing it like the x86 architectures. The patch handles fp register on entry/leave at syscall; [07/13] patch contains this part. I'm not familiar with that but what kind of optimizations does x86 architecture do for fp register handling ? > I am a bit confused overall. I mean, zpoline seems kind of neat, but a > requirement on patching userspace code also seems like a lot. > > To me, it seems much more natural to catch the userspace syscalls using > a SECCOMP filter[1]. While quite a lot slower, that should be much more > portable across architectures. For improved speed one could still do > architecture specific things inside the vDSO or by using zpoline. But > those would then "just" be optimizations and unpatched code would still > work correctly (e.g. JIT). I'm not proposing this patch to replace existing UML implementations; for instance, the patchset cannot run CONFIG_MMU code in the whole kernel tree so, existing ptrace-based implementation still has real usecase. and ptrace based syscall hook is not indeed fast and the improvements with seccomp filter instead clearly has benefits. I think it's independent to this patchset. So I think while your seccomp patches are also in review, this patchset can exist in parallel. btw, though I mentioned that JIT generated code is not currently handled in a different reply, it can be implemented as an extension to this patchset; the original implementation of zpoline now is able to patch JIT generated code as well. https://github.com/yasukata/zpoline/pull/20/commits/c42af16757ad3fcdf7084c9f2139bb9105796873 it is not implemented for the moment. in terms of the portability, the basic idea of syscall hook with zpoline is applicable to other platform, like aarch64 (https://github.com/retrage/svc-hook). so I believe it has a chance to expand this idea to other architectures than x86_64. > For me, a big argument in favour of such an approach is its simplicity. > I am mostly basing that on the fact that this patchset should properly > handle other signals like SIGFPE and SIGSEGV. And, once it does that, > you will already have all the infrastructure to do the correct register > save/restore using the host mcontex, which is what is needed in the > SIGSYS handler when using SECCOMP. The filter itself should be simple > as it just needs to catch all syscalls within valid userspace > executable memory[2] ranges. I agree with your observation that the approach is simple. I don't have a good idea on how to handle SIGSEGV, but will try to see with your inputs. > Benjamin > > [1] Maybe not surprising, as I have been working on a SECCOMP based UML > that does not require ptrace. yes, I'm aware of it since before. I have also conducted a benchmark with several hook mechanisms, including seccomp with simple getpid measurement. https://speakerdeck.com/thehajime/netdev0x18-zpoline?slide=16 > [2] I am assuming that userspace executable code is already confined to > a certain address space within the UML process. Obviously, the kernel > itself and loaded modules need to be free to do host syscalls and > should not be affected by the SECCOMP filter. I think our !MMU UML doesn't break this assumption. But did you see something to our patchset ? Thanks again, -- Hajime