Description: this session will focus on discussing the current state and key challenges of Nested Virtualization in Xen.
--- I'm going to reference this message to the mailing list in the related Gitlab epic: References: George Dunlap's two part talk in the previous Xen Summit: - - Andrew Cooper reminded us of the nested virtualization challenges, What is needed to make nested virt work again? Andrew: Xen does have some nested virt implementation from 2009/2010, bitrotting since then and weren't production quality since day one. Intel took care to virtualize everything relevant => confusing aspects that are not documented enough AMD took a more simpler route, but things don't quite work right. VMX/SVM are different pieces of work. Interrupt shadows : disabling interrupt is different for VMX/SVM Important to reduce the scope of the problem. Both Intel and AMD dropped support for 32bits virtualization. Bunch of features can be dropped for limiting the scope Still need to trap them but can say not implemented. Depend on the L2 guests: Windows with VBS is expecting different features Missing non-nested features for VBS that need to be implemented first before nested one. VM configuration is hard to change during run time because the configuration set was static Xen has a model where it expects one model set of what it expect a guest to run. First task: Change implementation of Xen to have one configuration per VM of the configuration instead of a global Meaning having different configuration to other VM. HW only has root and non-root mode (strictly x86) Nested virt need to implement L2 guest in non-root mode of L0 Xen usually has one VMCS/VMCB per vCPU L1 HV will have one VMCS/VMCB per L2 vCPU VMCS are a bunch of configuration, some exposed to guests others to control guests behavior. VMCS for L2 guests are merged from Xen, from L1 info called VMCS02. Drop host state from the L1 guests and use Xen host state. Features can be mutually exclusive. L2 guest will trap to Xen (L0) and Xen then needs to know if the VMEXIT is for it, L1 or both. Virtual VM entry, need to merge VMCS from exit and merge info about host part of L1. If it is correctly implemented, it can scale infinitely. L>3 guests. Alain Tchana: VMCS shadowing, is it needed? Andrew: It's complicated, it's a giant security hole since you can audit guest state VMCS (Intel) opaque memory needed a special instruction to READ/WRITE VMCB (AMD) a page of memory you can write/read to directly Easier for AMD to copy in/out large amount of memory. Yann: What is the current state in Xen implementation? Andrew: There is some, with known security issues and unknown ones. Marek: If you run KVM in a Xen guests, you have an instant crash. Yann: What are the plans to fix it? Andrew: The L1 VMCS configuration can be completely different from the one Xen will use, and need to modify this so Xen can have multiple guests configuration. See paper called Turtle for nested virtualization with VMCS merging. Need to store VMCS/VMCB state somewhere (easy with VMCB since it's just a mapped page) A bit of work from Andrew and Roger is needed before it can worked on by multiple people in parallel. Next course of action: - Wait for Andrew and Roger to fix MSR configuration from the toolstack. They're halfway through. According to them, that's sadly not a task we can really parallelize. - When it is ready more people can then participate by implementing missing features one by one (with unit tests) (There will be a suggested order of things that need to be implemented) Can't predict when features will intersect with existing bugs. A big thanks to Damien Thenot and Benjamin Reis for all the note taking, and of course to Andrew Cooper for most of the explaining. Samuel Verschelde | Vates XCP-ng Lead Maintainer / Release Manager / Technical Product Manager XCP-ng & Xen Orchestra - Vates solutions web: