On Thu, Jun 23, 2016 at 01:40:42AM +0300, Michael S. Tsirkin wrote: > Where's a problem then?
If EPT/NPT is enabled, the guest pagetables are parsed by the hardware and not by the KVM shadow MMU in software. The hardware speaks host phys bits and AFIK the hardware will behave different depending on the host phys bits. In fact the guest could probe those the host phys bits anyway. Now the breakage with guest phys bits < host phys bits happens only with EPT/NPT if the guest instead of "probing" the host phys bits, it just runs cpuid and it assumes the value it receives would be the same as the effect of a "probe". Then guest could assume the probing effect would match the guest phys bits returned by the guest cpuid insn, and do important stuff in function of that (i.e. expecting a GPF which won't materialize if the host phys bits is > guest phys bits). The guest must do somewhat weird for any breakage to happen (notably changing pagetable format in function of cpuid retval). The guest of course could also be changed to stop being weird and then it wouldn't break anymore. So just in case there's any weird proprietary OS like that, we can still add a -cpu=force_host_phys_bits fallback, to prevent the discrepancy between cpuid and probing effect, in turn eliminating any risk of guest failures (but then we should also prevent live migration if source host phys bits != destination host phys bits to provide the same guarantee to the weird guest, through live migration). > So I think that all we need is a way to let libvirt control > the _CRS range. Teach it that _CRS must fit within what > host can support. Also check and fail kvm init if _CRS exceeds > what host can support. Right. The production solution is such a simple patch that I certainly agree it can be applied first, along with the mtrr fix. The complexity in dealing with _CSR and all the up layers about this subtle phys bits detail, to calculate the highest possible guest physical address, is what makes the production solution attractive in the short term. Then if we implement the "soft" guest phys bits exercise, that is all about adding robustness to live migration (and save/restore). So for it not to risk to be futile, it'd be nice if the phys bits checks were all contained inside qemu. Initially libvirt/ovirt/OpenStack would just return some live migration generic error to the user, in the unlikely case there's a phys bits mismatch during the live migration or restore (i.e. "soft" guest phys bits > destination host phys bits). That will still avoid us getting weird bugreports and way more important it'll avoid any risk of customer unexpected guest crashes. The managers that load-balance the load in the cloud if they want they can still do their own calculation on the host/guest phys bits matching the qemu internal calculation and guarantee themselves that they'll never run into the qemu live migration error because of too low destination host phys bits (either that or they can check a proper error from the migration command). Thanks, Andrea