Hello, On Wed, Jun 22, 2016 at 02:41:22PM +0200, Paolo Bonzini wrote: > From a semantics point of view, using a smaller phys-addr-bits than the > host is the worst, because you tell the guest that some bits are > must-be-zero, when they're not. Using a larger phys-addr-bits cannot
Ok, so EPT/KVM should always use the host phys bits, never the guest ones, for EPT violations. KVM runs in the host so that's not a concern. EPT is irrelevant. The only relevancy is in the guest pagetables with EPT. I don't think any sane OS can break. It'd be inefficient too, to use a cacheline to check the phys bits at runtime before setting up pagetables. The MTRR if it doesn't set the "valid" phys bits to 1 (because the guest bits are reduced), it should be still safe. > cause malfunctioning, only crashes (and as Gerd said, if you cross your > fingers and hope the guest doesn't put anything so high in memory, > chances are you'll succeed), and this makes it "safer". I'm not sure > which one is more likely to happen. But the crash with guest phys bits > host phys bits is material, linux will definitely crash in such condition. Linux could not possibly crash instead if host phys bits > guest phys bits because it will never depend on GPF triggering if the must be zero bits of the guest pagetables are set. Linux won't ever try to set those bits and I'd be shocked if any other OS does. So while not perfect emulation of the hardware, the risk with known OS should be zero. > So there's no correct answer, and that's why I think the lesser evil is > to go with the time-tested alternative and use host phys-addr-bits as > the default, even if it causes weird behavior on migration. If a fixed > phys-addr-bits is specified on the destination, it should match the > value that was used on the source though. I agree with should start with the host phys bits like we use in production (plus the mtrr fix). It is a net improvement compared to upstream because it restrict the risk to only live migration and it otherwise always runs perfectly safe. Upstream is never safe on any host with phys bits != 40, especially if the host phys bits is < 40. My solution has the main benefit to avoid to compute the highest possible RAM/PCI bar guest physical address that could be ever mapped, in order to generate a "soft" guest phys bits. Later we could still consider to introduce a "soft" guest phys bits with the only objective of preventing the risk of migration breakages. qemu shouldn't let the guest migrate if we find destination host phys bits is < "soft" guest phys bits. Then a command line quirk -cpu=force_host_phys_bits would set the "soft" guest phys bits to the host value and prevent live migration to any destination with host phys bits != "soft" guest phys bits. And it should be used only for such hypothetical OS that depends on the must zero bits violations in the guest pagetables. If this is good idea or not as a second step, boils down to how difficult it is to calculate the highest possible guest physical address at boot time. If that's impossible with PCI hotplug (memory hotplug shouldn't be an issue), then "soft" guest phys bits also becomes mostly worthless (unless we require -cpu=force_host_phys_bits for PCI hotplug to work). Again starting with the host -> guest phys bits sounds fine to me, at least everything will work perfect in all cases except live migration, and you should know what you're doing with live migration if you've a very diverse host phys bits in the cloud nodes and very large guests too or guest with weird OS depending on must be zero bits violations in guest pagetables. Thanks, Andrea