On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <j...@vt.edu> wrote: > On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark > <christopher.w.cl...@gmail.com> wrote: >> Hi all, >> >> I have written a page for the OpenXT wiki describing a proposal for >> initial development towards the VirtIO-Argo transport driver, and the >> related system components to support it, destined for OpenXT and >> upstream projects: >> >> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1 >> >> Please review ahead of tomorrow's OpenXT Community Call. >> >> I would draw your attention to the Comparison of Argo interface options >> section: >> >> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Comparison-of-Argo-interface-options >> >> where further input to the table would be valuable; >> and would also appreciate input on the IOREQ project section: >> >> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-IOREQ-for-VirtIO-Argo >> >> in particular, whether an IOREQ implementation to support the >> provision of devices to the frontends can replace the need for any >> userspace software to interact with an Argo kernel interface for the >> VirtIO-Argo implementation. >> >> thanks, >> Christopher > > Hi, > > Really excited to see this happening, and disappointed that I'm not > able to contribute at this time. I don't think I'll be able to join > the call, but wanted to share some initial thoughts from my > middle-of-the-night review anyway. > > Super rough notes in raw unedited notes-to-self form: > > main point of feedback is: I love the desire to get a non-shared-mem > transport backend for virtio standardized. It moves us closer to an > HMX-only world. BUT: virtio is relevant to many hypervisors beyond > Xen, not all of which have the same views on how policy enforcement > should be done, namely some have a preference for capability-oriented > models over type-enforcement / MAC models. It would be nice if any > labeling encoded into the actual specs / guest-boundary protocols > would be strictly a mechanism, and be policy-agnostic, in particular > not making implicit assumptions about XSM / SELinux / similar. I don't > have specific suggestions at this point, but would love to discuss. > > thoughts on how to handle device enumeration? hotplug notifications? > - can't rely on xenstore > - need some internal argo messaging for this? > - name service w/ well-known names? starts to look like xenstore > pretty quickly... > - granular disaggregation of backend device-model providers desirable > > how does resource accounting work? each side pays for their own delivery ring? > - init in already-guest-mapped mem & simply register? > - how does it compare to grant tables? > - do you need to go through linux driver to alloc (e.g. xengntalloc) > or has way to share arbitrary otherwise not-special userspace pages > (e.g. u2mfn, with all its issues (pinning, reloc, etc.))? > > ioreq is tangled with grant refs, evt chans, generic vmexit > dispatcher, instruction decoder, etc. none of which seems desirable if > trying to move towards world with strictly safer guest interfaces > exposed (e.g. HMX-only) > - there's no io to trap/decode here, it's explicitly exclusively via > hypercall to HMX, no? > - also, do we want argo sendv hypercall to be always blocking & synchronous? > - or perhaps async notify & background copy to other vm addr space? > - possibly better scaling? > - accounting of in-flight io requests to handle gets complicated > (see recent XSA) > - PCI-like completion request semantics? (argo as cross-domain > software dma engine w/ some basic protocol enforcement?) > > "port" v4v driver => argo: > - yes please! something without all the confidence-inspiring > DEBUG_{APPLE,ORANGE,BANANA} indicators of production-worthy code would > be great ;) > - seems like you may want to redo argo hypercall interface too? (at > least the syscall interface...) > - targeting synchronous blocking sendv()? > - or some async queue/completion thing too? (like PF_RING, but with > *iov entries?) > - both could count as HMX, both could enforce no double-write racing > games at dest ring, etc. > > re v4vchar & doing similar for argo: > - we may prefer "can write N bytes? -> yes/no" or "how many bytes can > write? -> N" over "try to write N bytes -> only wrote M, EAGAIN" > - the latter can be implemented over the former, but not the other way around > - starts to matter when you want to be able to implement in userspace > & provide backpressure to peer userspace without additional buffering > & potential lying about durability of writes > - breaks cross-domain EPIPE boundary correctness > - Qubes ran into same issues when porting vchan from Xen to KVM > initially via vsock > > some virtio drivers explicitly use shared mem for more than just > communication rings: > - e.g. virtio-fs, which can map pages as DAX-like fs backing to share page > cache > - e.g. virtio-gpu, virtio-wayland, virtio-video, which deal in framebuffers > - needs thought about how best to map semantics to (or at least > interoperate cleanly & safely with) HMX-{only,mostly} world > - the performance of shared mem actually can meaningfully matter for > e.g. large framebuffers in particular due to fundamental memory > bandwidth constraints > > what is mentioned PX hypervisor? presumably short for PicoXen? any > public information?
Not much at the moment, but there is prior public work. PX is an OSS L0 "Protection Hypervisor" in the Hardened Access Terminal (HAT) architecture presented by Daniel Smith at the 2020 Xen Summit: https://youtube.com/watch?v=Wt-SBhFnDZY&t=3m48s PX is intended to build on lessons learned from IBM Ultravisor, HP/Bromium AX and AIS Bareflank L0 hypervisors: IBM: https://www.platformsecuritysummit.com/2019/speaker/hunt/ HP/Bromium: https://www.platformsecuritysummit.com/2018/speaker/pratt/ Dec 2019 meeting in Cambridge, Day2 discussion included L0 nesting hypervisor, UUID semantics, Argo, communication between nested hypervisors: https://lists.archive.carbon60.com/xen/devel/577800 Bareflank: https://youtube.com/channel/UCH-7Pw96K5V1RHAPn5-cmYA Xen Summit 2020 design session notes: https://lists.archive.carbon60.com/xen/devel/591509 In the long-term, efficient hypervisor nesting will require close cooperation with silicon and firmware vendors. Note that Intel is introducing TDX (Trust Domain Extensions): https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html https://www.brighttalk.com/webcast/18206/453600 There are also a couple of recent papers from Shanghai Jiao Tong University, on using hardware instructions to accelerate inter-domain HMX. March 2019: https://ipads.se.sjtu.edu.cn/_media/publications/skybridge-eurosys19.pdf > we present SkyBridge, a new communication facility designed and optimized for > synchronous IPC in microkernels. SkyBridge requires no involvement of kernels > during communication and allows a process to directly switch to the virtual > address space of the target process and invoke the target function. SkyBridge > retains the traditional virtual address space isolation and thus can be > easily integrated into existing microkernels. The key idea of SkyBridge is to > leverage a commodity hardware feature for virtualization (i.e., [Intel EPT] > VMFUNC) to achieve efficient IPC. To leverage the hardware feature, SkyBridge > inserts a tiny virtualization layer (Rootkernel) beneath the original > microkernel (Subkernel). The Rootkernel is carefully designed to eliminate > most virtualization overheads. SkyBridge also integrates a series of > techniques to guarantee the security properties of IPC. We have implemented > SkyBridge on three popular open-source microkernels (seL4, Fiasco.OC, and > Google Zircon). The evaluation results show that SkyBridge improves the speed > of IPC by 1.49x to 19.6x for microbenchmarks. For real-world applications > (e.g., SQLite3 database), SkyBridge improves the throughput by 81.9%, 1.44x > and 9.59x for the three microkernels on average. July 2020: https://ipads.se.sjtu.edu.cn/_media/publications/guatc20.pdf > a redesign of traditional microkernel OSes to harmonize the tension between > messaging performance and isolation. UnderBridge moves the OS components of a > microkernel between user space and kernel space at runtime while enforcing > consistent isolation. It retrofits Intel Memory Protection Key for Userspace > (PKU) in kernel space to achieve such isolation efficiently and design a fast > IPC mechanism across those OS components. Thanks to PKU’s extremely low > overhead, the inter-process communication (IPC) roundtrip cost in UnderBridge > can be as low as 109 cycles. We have designed and implemented a new > microkernel called ChCore based on UnderBridge and have also ported > UnderBridge to three mainstream microkernels, i.e., seL4, Google Zircon, and > Fiasco.OC. Evaluations show that UnderBridge speeds up the IPC by 3.0× > compared with the state-of-the-art (e.g., SkyBridge) and improves the > performance of IPC-intensive applications by up to 13.1× for the above three > microkernels For those interested in Argo and VirtIO, there will be a conference call on Thursday, Jan 14th 2021, at 1600 UTC. Rich