Hi Dongjiu Geng, On 14/12/2018 10:15, Dongjiu Geng wrote: > When user space do memory recovery, it will check whether KVM and > guest support the error recovery, only when both of them support, > user space will do the error recovery. This patch exports this > capability of KVM to user space.
I can understand user-space only wanting to do the work if host and guest support the feature. But 'error recovery' isn't a KVM feature, its a Linux kernel feature. KVM will send it's user-space a SIGBUS with MCEERR code whenever its trying to map a page at stage2 that the kernel-mm code refuses this because its poisoned. (e.g. check_user_page_hwpoison(), get_user_pages() returns -EHWPOISON) This is exactly the same as happens to a normal user-space process. I think you really want to know if the host kernel was built with CONFIG_MEMORY_FAILURE. The not-at-all-portable way to tell this from user-space is the presence of /proc/sys/vm/memory_failure_* files. (It looks like the prctl():PR_MCE_KILL/PR_MCE_KILL_GET options silently update an ignored policy if the kernel isn't built with CONFIG_MEMORY_FAILURE, so they aren't helpful) > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index cd209f7..241e2e2 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4895,3 +4895,12 @@ Architectures: x86 > This capability indicates that KVM supports paravirtualized Hyper-V IPI send > hypercalls: > HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx. > + > +8.21 KVM_CAP_ARM_MEMORY_ERROR_RECOVERY > + > +Architectures: arm, arm64 > + > +This capability indicates that guest memory error can be detected by the KVM > which > +supports the error recovery. KVM doesn't detect these errors. The hardware detects them and notifies the OS via one of a number of mechanisms. This gets plumbed into memory_failure(), which sets a flag that the mm code uses to prevent the page being used again. KVM is only involved when it tries to map a page at stage2 and the mm code rejects it with -EHWPOISON. This is the same as the architectures do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of handle_mm_fault(). We don't have a KVM cap for this, nor do we need one. > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index b72a3dd..90d1d9a 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -82,6 +82,7 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long > ext) > r = kvm_arm_support_pmu_v3(); > break; > case KVM_CAP_ARM_INJECT_SERROR_ESR: > + case KVM_CAP_ARM_MEMORY_ERROR_RECOVERY: > r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN); > break; The CPU RAS Extensions are not at all relevant here. It is perfectly possible to support memory-failure without them, AMD-Seattle and APM-X-Gene do this. These systems would report not-supported here, but the kernel does support this stuff. Just because the CPU supports this, doesn't mean the kernel was built with CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to SIGKILL. Thanks, James