On 12/4/20 11:37 PM, Andrew Warkentin wrote: > On 12/2/20, Demi M. Obenour wrote: >> >> That’s understandable. From my perspective, it appears that the only >> change required would be to swap out the VMM, since the same kernel >> capabilities would be required either way. The only difference would >> be that a nested instance of UX/RT would need to get untyped memory >> objects somehow, which seems simple. Of course, I could very well >> be missing something here ― if this would meaningfully increase >> the complexity of the system, it probably isn’t worth it. >> > > In addition to the aforementioned issues with binary compatibility, it > would also complicate booting such systems. UX/RT will have a uniform > boot process in which the kernel, root server, and all other files > required for early boot are contained within a "supervisor image" in > the root directory of the root container being booted rather than > having multiple images on disk, and the root server will be dependent > on having the Multiboot2 info passed through to it (so it doesn't have > to understand the filesystem format of the supervisor image). There > would also have to be some kind of API for communication with > alternate personalities running on the same kernel (although of course > on a hypervisor there would have to be an API for communicating with > other VMs as well).
Could the same API be used for both? > Also, it's kind of redundant because UX/RT will already have extensive > containerization support at the VFS level (and since literally all IPC > and user memory will be file-based this means complete control over > what is accessible in containers). This would really just be pushing > containerization down to the kernel level, which would be of little > benefit. > >> >> That said, from the way you phrased your message, I thought you were >> referring to a type-1 hypervisor that would run below UX/RT. IMO, that >> is where such a tool really belongs ― it can provide strong isolation >> guarantees between multiple seL4-based systems, and still allow each >> of those systems to use hardware virtualization if they so desire. >> For instance, an embedded system might have high-assurance components >> running on the seL4 Core Platform, while UX/RT is used as a replacement >> for VMs running Linux. Similarly, a hypothetical seL4-based QubesOS >> might use this type-1 hypervisor to isolate qubes from each other. > > Yes, that's exactly what I was planning to do. I'm wanting to write a > Type 1 hypervisor based on seL4 or a fork, which will be independent > of UX/RT (although it will be distributed along with it). It will have > a somewhat Xen-ish architecture in that all services will be provided > by backend drivers running in VMs (the VMM will run as the root server > and will be the only process-like thing running on the hypervisor > microkernel). > >> >> FYI, since you plan on a Linux compatibility layer, you might want to >> contact the illumos developers. illumos is, of course, a completely >> different OS design, but they do have a full Linux compatibility >> layer and might be able to give some implementation advice. >> > > Their Linux compatibility layer probably wouldn't really be all that > relevant to that of UX/RT. UX/RT's Linux system call handling will be > purely library-based, which is rather different from a kernel-based > compatibility layer. Still, they might have useful advise on what is most important to support, and what can be left unimplemented. For instance, I believe they left out signal-driven I/O, since hardly anyone used it. >> An officially-supported, API- and ABI- stable C library is planned, >> so this may not be a roadblock for much longer. >> > > I was under the impression that it wasn't going to have a stable ABI > since it is intended more for static systems. The system call API and ABI won’t be stable, but the API and ABI of sel4corelib will. >> 1. The need for an emulator removes many of the assurance guarantees >> provided by seL4, since one must rely on the correctness of the >> emulator to prevent in-VM privilege escalation vulnerabilities. >> Such vulnerabilities are not uncommon in existing hypervisors. >> > > There isn't necessarily a need for an emulator as such for PVH-type > VMs, although there will of course be device backends. > > Also, since this is meant for dynamic systems, verification of the > microkernel is somewhat less relevant, since the TCB of any process or > VM will always include highly dynamic user-mode subsystems that will > be difficult or impossible to verify (the UX/RT root server as well as > that of the hypervisor will be written in Rust to reduce the potential > for significant vulnerabilities). What I meant is that when one is interacting with hardware directly, logic errors are far more likely than normal to lead to vulnerabilities. For instance, incorrectly emulating a faulting instruction is memory safe from the VMM perspective, but could easily lead to a privilege escalation vulnerability in the guest. That’s where running natively on seL4 is a huge security win. > For a system with a static enclave running on the same processor as a > dynamic OS, a verified kernel could still be used (presumably without > any direct hypercall support unless that could be verified, since it > would be less likely that there would be a need for high-throughput > IPC between the two sides). Enclaves are indeed the case I was thinking of, although I do not believe they would normally be referred to as such. Specifically, it would be nice for UX/RT to be able to run on the same kernel as the rest of an otherwise static system. >> 2. Nested hardware virtualization is quite difficult to implement, and >> has significant overhead. On the other hand, nested virtualization >> based on seL4 capabilities is free. > > UX/RT itself probably won't do anything with hardware virtualization > extensions directly. It will support hosting I/O emulators and device > backends for the underlying Type 1 hypervisor, and possibly > Skybridge-type acceleration, but those will only use hypercalls. > >> >> 3. I doubt seL4 supports issuing any specialized hypercall >> instructions, so you might need to fall back to emulation. >> > > That's why I was wondering about adding kernel-level support for > direct hypercalls. > > On 12/3/20, Indan Zupancic wrote: >> >> That seems possible to implement. The challenge is to implement >> this without increasing seL4's complexity and without slowing down >> all notification and IPC system calls, as it would be a couple of >> special checks somewhere. It would require new system calls, e.g: >> seL4_Notification_SetIRQHandler() and seL4_IPC_SetIRQHandler(). >> Or it can be a TCB/VCPU binding instead. >> >> It does not solve blocking outgoing IPC calls, but I assume you >> don't mind those. Interrupts must be kept disabled for the VM >> during such calls. >> >>> It also has to work on x86 as well. >> >> The VM receive side needs special seL4 kernel support to generate >> interrupts when there are pending notifications or messages. This >> is architecture independent. >> >> The VM send side needs a way to directly make calls into the seL4 >> kernel instead of going via the VMM. This needs VM kernel support, >> because such instructions are usually privileged. All VM <-> VMM >> communication happens via the seL4 kernel already, so all that is >> needed is for seL4 to know whether it is a VMM call or a system >> call. On ARM this happens by looking at whether it was a svc or a >> hvc call. It seems likely that x86 can do something similar. >> >> For people who are wondering what use this has, the advantage of >> taking the VMM out of the loop for seL4 <-> VM communication is not >> only possible performance increases, but it also simplifies the VMM >> and makes it more agnostic about the system as a whole. The system >> design becomes simpler because you can use the normal seL4 API >> everywhere, including VM user space. This makes it easier to move >> code between VM user space and native seL4. >> > > I'm thinking of an interface where hypercalls are made by writing to > virtual registers rather than using explicit trapping instructions, > with the possibility to create multiple hypercall interfaces per VM, > each with its own set of endpoint capabilities (although interrupt > delivery would probably still be best off being centralized with a > single interface per VCPU, for use only by the kernel). This would > make it easy for user processes within a VM to make hypercalls since > the guest system could create a hypercall interface for each process. > Hypercall support for user processes in VMs would also pretty much > require moving to a semi-asynchronous model where only the hypercall > interface itself gets blocked without blocking the VCPU (attempting to > do anything with a hypercall interface that has a call already in > progress would fail, but the VCPU would continue to run and all other > hypercall interfaces would remain available). There would presumably > have to be a dummy TCB (with only a CSpace but no VSpace or CPU state > defined) for each hypercall interface separate from the TCB of the VM. > IPC for normal threads would remain synchronous. Virtual registers may very well take a significant performance hit. If I recall correctly, dedicated system call instructions are heavily optimized at the hardware level, and are much faster than a generic trap. I believe the same is true of HVC on ARM. Some of the overhead can be made up by means of a ring buffer, but some will remain. Sincerely, Demi P.S.: I hope that UX/RT is wildly successful!
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected]
