On Wed, Nov 1, 2023 at 2:41 PM Sean Christopherson <sea...@google.com> wrote: > > On Wed, Nov 01, 2023, Xiaoyao Li wrote: > > On 10/31/2023 10:16 PM, Sean Christopherson wrote: > > > On Tue, Oct 31, 2023, Xiaoyao Li wrote: > > > > On 10/28/2023 2:21 AM, Sean Christopherson wrote: > > > > > Extended guest_memfd to allow backing guest memory with transparent > > > > > hugepages. Require userspace to opt-in via a flag even though there's > > > > > no > > > > > known/anticipated use case for forcing small pages as THP is optional, > > > > > i.e. to avoid ending up in a situation where userspace is unaware that > > > > > KVM can't provide hugepages. > > > > > > > > Personally, it seems not so "transparent" if requiring userspace to > > > > opt-in. > > > > > > > > People need to 1) check if the kernel built with TRANSPARENT_HUGEPAGE > > > > support, or check is the sysfs of transparent hugepage exists; 2)get the > > > > maximum support hugepage size 3) ensure the size satisfies the > > > > alignment; > > > > before opt-in it. > > > > > > > > Even simpler, userspace can blindly try to create guest memfd with > > > > transparent hugapage flag. If getting error, fallback to create without > > > > the > > > > transparent hugepage flag. > > > > > > > > However, it doesn't look transparent to me. > > > > > > The "transparent" part is referring to the underlying kernel mechanism, > > > it's not > > > saying anything about the API. The "transparent" part of THP is that the > > > kernel > > > doesn't guarantee hugepages, i.e. whether or not hugepages are actually > > > used is > > > (mostly) transparent to userspace. > > > > > > Paolo also isn't the biggest fan[*], but there are also downsides to > > > always > > > allowing hugepages, e.g. silent failure due to lack of THP or unaligned > > > size, > > > and there's precedent in the form of MADV_HUGEPAGE. > > > > > > [*] > > > https://lore.kernel.org/all/84a908ae-04c7-51c7-c9a8-119e1933a...@redhat.com > > > > But it's different than MADV_HUGEPAGE, in a way. Per my understanding, the > > failure of MADV_HUGEPAGE is not fatal, user space can ignore it and > > continue. > > > > However, the failure of KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is fatal, which leads > > to failure of guest memfd creation. > > Failing KVM_CREATE_GUEST_MEMFD isn't truly fatal, it just requires different > action from userspace, i.e. instead of ignoring the error, userspace could > redo > KVM_CREATE_GUEST_MEMFD with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE=0. > > We could make the behavior more like MADV_HUGEPAGE, e.g. theoretically we > could > extend fadvise() with FADV_HUGEPAGE, or add a guest_memfd knob/ioctl() to let > userspace provide advice/hints after creating a guest_memfd. But I suspect > that > guest_memfd would be the only user of FADV_HUGEPAGE, and IMO a post-creation > hint > is actually less desirable. > > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will fail only if userspace didn't provide a > compatible size or the kernel doesn't support THP. An incompatible size is > likely > a userspace bug, and for most setups that want to utilize guest_memfd, lack > of THP > support is likely a configuration bug. I.e. many/most uses *want* failures > due to > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE to be fatal. > > > For current implementation, I think maybe KVM_GUEST_MEMFD_DESIRE_HUGEPAGE > > fits better than KVM_GUEST_MEMFD_ALLOW_HUGEPAGE? or maybe *PREFER*? > > Why? Verbs like "prefer" and "desire" aren't a good fit IMO because they > suggest > the flag is a hint, and hints are usually best effort only, i.e. are ignored > if > there is a fundamental incompatibility. > > "Allow" isn't perfect, e.g. I would much prefer a straight > KVM_GUEST_MEMFD_USE_HUGEPAGES > or KVM_GUEST_MEMFD_HUGEPAGES flag, but I wanted the name to convey that KVM > doesn't > (yet) guarantee hugepages. I.e. KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is stronger > than > a hint, but weaker than a requirement. And if/when KVM supports a dedicated > memory > pool of some kind, then we can add KVM_GUEST_MEMFD_REQUIRE_HUGEPAGE.
I think that the current patch is fine, but I will adjust it to always allow the flag, and to make the size check even if !CONFIG_TRANSPARENT_HUGEPAGE. If hugepages are not guaranteed, and (theoretically) you could have no hugepage at all in the result, it's okay to get this result even if THP is not available in the kernel. Paolo