On 25/05/2018 08:49, Jan Beulich wrote: >>>> On 22.05.18 at 13:20, <andrew.coop...@citrix.com> wrote: >> @@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v, >> unsigned int cr, >> >> static void vmx_update_guest_efer(struct vcpu *v) >> { >> - unsigned long vm_entry_value; >> + unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer, >> + xen_efer = read_efer(); >> + >> + if ( paging_mode_shadow(v->domain) ) >> + { >> + /* >> + * When using shadow pagetables, EFER.NX is a Xen-owned bit and is >> not >> + * under guest control. >> + */ >> + guest_efer &= ~EFER_NX; >> + guest_efer |= xen_efer & EFER_NX; >> + >> + /* >> + * At the time of writing (May 2018), the Intel SDM "VM Entry: >> Checks >> + * on Guest Control Registers, Debug Registers and MSRs" section >> says: >> + * >> + * If the "Load IA32_EFER" VM-entry control is 1, the following >> + * checks are performed on the field for the IA32_MSR: >> + * - Bits reserved in the IA32_EFER MSR must be 0. >> + * - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value >> of >> + * the "IA-32e mode guest" VM-entry control. It must also be >> + * identical to bit 8 (LME) if bit 31 in the CR0 field >> + * (corresponding to CR0.PG) is 1. >> + * >> + * Experimentally what actually happens is: >> + * - Checks for EFER.{LME,LMA} apply uniformly whether using the >> + * GUEST_EFER VMCS controls, or MSR load/save lists. >> + * - Without EPT, LME being different to LMA isn't tolerated by >> + * hardware. As writes to CR0 are intercepted, it is safe to >> + * leave LME clear at this point, and fix up both LME and LMA >> when >> + * CR0.PG is set. >> + */ >> + if ( !(guest_efer & EFER_LMA) ) >> + guest_efer &= ~EFER_LME; >> + } > Why is this latter adjustments done only for shadow mode?
How should I go about making the comment clearer? When EPT is active, hardware is happy with LMA != LME. When EPT is disabled, hardware strictly requires LME == LMA. This particular condition occurs architecturally on the transition into long mode, between setting LME and setting CR0.PG, and updating EFER controls in the naive way results in a vmentry failure. Having spoken to Intel, they agree with my assessment that the docs appear to be correct for Gen1 hardware, and stale for Gen2 hardware, where fixing this was one of many parts of making Unrestricted Guest work. > After the above adjustments, when guest_efer still matches > v->arch.hvm_vcpu.guest_efer, couldn't we disable the MSR read > intercept? In principle, yes. We use load/save lists, as long as we remembered to recalculate EFER every time CR0 gets modified in the shadow path. However, that would be a net performance penalty rather than benefit (which is why I've gone to the effort of creating load-only lists). In practice, EFER is written at boot and not touched again. Having load/save logic might avoid these vmexits, but at the cost of almost every other vmexit needing to keep the guest_efer in sync with the load/save list or VMCS field. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel