On 22.12.2011 13:32, Reinoud Zandijk wrote: > I understand your confusion on this point. Its due to the way NetBSD/usermode > is build and why it is build that way. The main goals/features, for me at > least, and even though some were formulated allong the way, come back to: > > - it should behave like a separate (though virtual) machine. > - there should be no difference between operating and developing in a > NetBSD/usermode and a normal NetBSD kernel as much as possible. > - it should be usable for kernel development for as many subsystems as > possible. > - it should be portable to, or just run on, every POSIX machine. >[snip] > After the memory has been set up it then attaches devices, like a virtual cpu > and a ld(4) driver for a disk image. After the attachments, NetBSD/usermode > loads and starts init(8) from *within* its own memory space.
Okay, from a design decision this is questionable. If the intent is to behave like a separate virtual machine and be as consistent as possible when comparing usermode to native kernel, the usermode kernel ought to run in its own address space: - catching userland address deref, missing/incomplete copyin/copyout is easier (unless you get all your mprotect(2)/mmap(2) calls right) - kernel being not mapped in userland address space, there is no risk of having kernel readable memory (unless, again, you get your mprotect(2) calls right...) - there are preliminary work to this; Antti's rumphijack, and IIRC, latest versions of UML have separate userland/kernel spaces. Perhaps vkernel from Dragonfly too. Dunno if these are really MI though. Please note that adding non-POSIX flags to mmap(2) makes the whole thing less portable. If usermode expects to run properly in any POSIX environment, you have to remain largely POSIX compliant. The MMAP_NOSYSCALLS part is not. > Externalizing the userland processes would not only violate some of the goals > but would also create a potential logistical nightmare. This would also create > a distributed system rather than a NetBSD usermode kernel. A whole new project > that would be fun to do, but out of scope. It could include process migration > between machines, network transport, caching and proxies etc. etc. How so? Having separate kernel/userland spaces has prior art. See above. > To manage running native binaries, it needed help from the kernel and thus > this patch arose. With it regions of memory could be designated as > `not-for-systemcalls'. It could be that argued that a single virtual memory > range setting function for this purpose could be used but that would make it a > very tailored solution and not the general purpose one it is now. It needed help from what kernel? Your patch + explanations makes me think that the NOSYSCALLS regions have to be set by the usermode kernel. Which means that NetBSD/usermode relies on a mechanism not offered by other POSIX kernels out there; there's approx. 0 chance that this will ever be accepted on other operating systems. >>> On the enhancing security argument, malicious source code could trigger >>> compiler bugs that allow for code to be modified or otherwise manipulated >>> to issue system calls where they shouldn't. Although it wouldn't nessiarily >>> pose a system security issue, it could be used for extracting info or for >>> malicious behaviour where with the patch it would simply bomb out. >> >> That's the part I have trouble with. It looks like a weaker form of W^X (or >> PaX's mprotect), and I can't see the "additional" security benefits. > > I've looked into that too, well mprotect() in particular. Even though the > manpage tells it can explicitly allow for execution, lots of pmap > implementations warn that their architectures can't distinguish between > reading and executing permissions since their memory management modules simply > don't distinguish between the two. More importantly the code, DOES need to > execute in the mappings only system calls are to be prohibited. Elaborate > single-stepping and/or code analying and replacing to find those instructions > could be used but at what costs? Code might be interrupted with some > constants, code might not start at byte 0, etc. etc. Heuristics are then to be > used at best. Again, I do believe that a correct approach would be separate address spaces + per-process "raise SIGILL on syscall" (ptrace maybe?), instead of implementing non portable logic inside NetBSD kernel. I know that there are multiple ports that have separate address space between kernel and userland, but rely on MD machinery to work properly (amd64 Xen port being one of them). >> Malicious code is free to trigger compiler bugs that can make calls to valid >> memory areas. If you manage to plant a "int 0x80" in a MMAP_NOSYSCALLS >> executable region, just make it to a "call __syscall". At the expense of a >> few more arguments, you will get the same result. > > It depends on the implementation. Do you f.e. allow the linkage of this code > to functions outside a designated list, or outside a designated area? If it > manages to find __syscall by itself in its host program and patch up a direct > call to that constant then yes it could call the OS. Static linking and > strip(1) is your friend then. > > In NetBSD/usermode it would then still only be > able to call the NetBSD/usermode kernel and not the host kernel. I think that this needs clarification. Please correct my mistakes: - NetBSD/usermode kernel is started as a userland process, and uses POSIX API to setup its environment. - it then proceeds to setting the userland memory regions with MMAP_NOSYSCALLS flags, so userland cannot make direct syscalls to host kernel - passes execution to init(1) and userland - all userland code making direct syscalls, this raises a SIGILL each time which gives a chance to the usermode kernel to handle the userland syscalls. Right? So how can you implement the MMAP_NOSYSCALLS step on other POSIXy systems? Merry Christmas to you (and everyone else too) :) -- Jean-Yves Migeon jeanyves.mig...@free.fr