On Mon, Feb 02, 2026 at 02:36:37PM -0800, Ackerley Tng wrote: > (resending to fix Message-ID) > > Here's a second revision of guest_memfd In-place conversion support. > > In this version, other than addressing comments from RFCv1 [1], the largest > change is that guest_memfd now does not avoid participation in LRU; it > participates in LRU by joining the unevictable list (no change from before > this > series). > > While checking for elevated refcounts during shared to private conversions, > guest_memfd will now do an lru_add_drain_all() if elevated refcounts were > found, > before concluding that there are true users of the shared folio and erroring > out. > > I'd still like feedback on these points, if any: > > 1. Having private/shared status stored in a maple tree (Thanks Michael for > your > support of using maple trees over xarrays for performance! [5]). > 2. Having a new guest_memfd ioctl (not a vm ioctl) that performs conversions. > 3. Using ioctls/structs/input attribute similar to the existing vm ioctl > KVM_SET_MEMORY_ATTRIBUTES to perform conversions. > 4. Storing requested attributes directly in the maple tree. > 5. Using a KVM module-wide param to toggle between setting memory attributes > via > vm and guest_memfd ioctls (making them mututally exclusive - a single > loaded > KVM module can only do one of the two.). > > [...snip...] > > > -- > 2.53.0.rc1.225.gd81095ad13-goog
I’ve tested memory failure handling after applying this series and here’s what memory_failure() does: Shared memory: In line with other in-memory filesystems, the memory_failure() handler unmaps the page if it is currently mapped, and issues a SIGBUS - if memory failure was injected with MF_ACTION_REQUIRED or - if the test process’s memory corruption kill policy is PR_MCE_KILL_EARLY Here’s the above, in table form: | MF_ACTION_REQUIRED | Kill Policy | Mapped | Dirty | Result: SIGBUS | |--------------------|---------------------|--------|-------|----------------| | false | PR_MCE_KILL_EARLY | true | true | true | | false | PR_MCE_KILL_EARLY | true | false | false | | false | PR_MCE_KILL_EARLY | false | true | false | | false | PR_MCE_KILL_EARLY | false | false | false | | false | PR_MCE_KILL_LATE | true | true | false | | false | PR_MCE_KILL_LATE | true | false | false | | false | PR_MCE_KILL_LATE | false | true | false | | false | PR_MCE_KILL_LATE | false | false | false | | true | Any Policy | true | true | true | | true | Any Policy | true | false | false | (I used MADV_HWPOISON to inject memory failures with MF_ACTION_REQUIRED set, and there was no way to use MADV_HWPOISON without first mapping the page in. To inject memory failures without MF_ACTION_REQUIRED set, I used debugfs’ hwpoison/corrupt-pfn.) Private memory: The handler unmaps the page for the stage 2 page table and does not issue a SIGBUS - the page is never mapped to the host, since it is private to the guest. | MF_ACTION_REQUIRED | Kill Policy | Mapped | Dirty | Result: SIGBUS | |--------------------|---------------------|--------|-------|----------------| | false | PR_MCE_KILL_EARLY | false | true | false | | false | PR_MCE_KILL_EARLY | false | false | false | | false | PR_MCE_KILL_LATE | false | true | false | | false | PR_MCE_KILL_LATE | false | false | false | (I couldn’t use MADV_HWPOISON since private memory could not be mapped and hence will not have a userspace address) I’ll post updated memory failure tests together with the next revision of this series [1] to fix MF_DELAYED handling on memory failure. [1] https://lore.kernel.org/all/[email protected]/T/
