Currently the release of Xen 4.11 is blocked due to a sporadic failure
of the OSSTEST guest-saverestore[.2]. During that test a hypercall
issued by libxc via the Linux privcmd driver returns -EFAULT in spite
of all hypercall buffers locked in memory via mlock() (or similar flags
specified in a mmap() call).

My analysis has revealed that modern Linux kernels might make such
locked user pages unaccessible for very short periods of time. This can
happen e.g. when pages are subject to compaction or migration.

There are multiple ways to mitigate this problem:

1. Trying to switch page migration or compaction off in dom0.
   Pros: - no change in Xen necessary
   Cons: - new cases might come up in the future
         - easy to miss, failures are really very sporadic and might
           happen only after updating the kernel

2. Add a bandaid to Xen tools by retrying hypercalls which have failed
   with -EFAULT (either for all or only for some hypercalls)
   Pros: - no interface change necessary
   Cons: - not all hypercalls might be just repeatable
         - problem isn't solved but just worked around

3. Modify the interface to the privcmd driver to pass information about
   used buffers to the kernel in order to lock them there. Either add a
   new interface for hypercall buffer management or add the list of
   buffers to the privcmd ioctl data structure.
   Pros: - problem is really solved
   Cons: - split solution between kernel and Xen, both must be changed

4. Modify the interface between hypervisor and kernel: instead of just
   returning -EFAULT let the hypervisor behave more like copy_to_user by
   raising a page fault which can then be fixed up in the kernel. This
   change must be activated by the kernel, of course.
   Pros: - rather simple change in the kernel "doing the right thing"
         - hypercall bounce buffer handling in libxc/libxencall can be
           switched off for a kernel supporting this chnage
   Cons: - split solution between kernel and Xen, both must be changed
         - not sure how complex the required hypervisor change will be

It should be noted that we can either select only one of above solutions
or one of 3/4 and additionally one of 1/2 as a fallback for old kernels.

How to proceed?

I'd like to have an answer as fast as possible to unblock 4.11 release.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to