Hi Hugh! On Tue, 2025-09-16 at 07:23 +0100, Hugh Dickins wrote:> On Fri, 12 Sep 2025, Roy, Patrick wrote: > >> From: Elliot Berman <quic_eber...@quicinc.com> >> >> When guest_memfd removes memory from the host kernel's direct map, >> direct map entries must be restored before the memory is freed again. To >> do so, ->free_folio() needs to know whether a gmem folio was direct map >> removed in the first place though. While possible to keep track of this >> information on each individual folio (e.g. via page flags), direct map >> removal is an all-or-nothing property of the entire guest_memfd, so it >> is less error prone to just check the flag stored in the gmem inode's >> private data. However, by the time ->free_folio() is called, >> folio->mapping might be cleared. To still allow access to the address >> space from which the folio was just removed, pass it in as an additional >> argument to ->free_folio, as the mapping is well-known to all callers. >> >> Link: >> https://lore.kernel.org/all/15f665b4-2d33-41ca-ac50-fafe24ade...@redhat.com/ >> Suggested-by: David Hildenbrand <da...@redhat.com> >> Acked-by: David Hildenbrand <da...@redhat.com> >> Signed-off-by: Elliot Berman <quic_eber...@quicinc.com> >> [patrick: rewrite shortlog for new usecase] >> Signed-off-by: Patrick Roy <roy...@amazon.co.uk> >> --- >> Documentation/filesystems/locking.rst | 2 +- >> fs/nfs/dir.c | 11 ++++++----- >> fs/orangefs/inode.c | 3 ++- >> include/linux/fs.h | 2 +- >> mm/filemap.c | 9 +++++---- >> mm/secretmem.c | 3 ++- >> mm/vmscan.c | 4 ++-- >> virt/kvm/guest_memfd.c | 3 ++- >> 8 files changed, 21 insertions(+), 16 deletions(-) >> >> diff --git a/Documentation/filesystems/locking.rst >> b/Documentation/filesystems/locking.rst >> index aa287ccdac2f..74c97287ec40 100644 >> --- a/Documentation/filesystems/locking.rst >> +++ b/Documentation/filesystems/locking.rst >> @@ -262,7 +262,7 @@ prototypes:: >> sector_t (*bmap)(struct address_space *, sector_t); >> void (*invalidate_folio) (struct folio *, size_t start, size_t len); >> bool (*release_folio)(struct folio *, gfp_t); >> - void (*free_folio)(struct folio *); >> + void (*free_folio)(struct address_space *, struct folio *); >> int (*direct_IO)(struct kiocb *, struct iov_iter *iter); >> int (*migrate_folio)(struct address_space *, struct folio *dst, >> struct folio *src, enum migrate_mode); > > Beware, that is against the intent of free_folio(). > > Since its 2.6.37 origin in 6072d13c4293 ("Call the filesystem back > whenever a page is removed from the page cache"), freepage() or > free_folio() has intentionally NOT taken a struct address_space *mapping, > because that structure may already be freed by the time free_folio() is > called, if the last folio holding it has now been freed. > > Maybe something has changed since then, or maybe it happens to be safe > just in the context in which you want to use it; but it is against the > principle of free_folio(). (Maybe an rcu_read_lock() could be added > in __remove_mapping() to make it safe nowadays? maybe not welcome.) > > See Documentation/filesystems/vfs.rst: > free_folio is called once the folio is no longer visible in the > page cache in order to allow the cleanup of any private data. > Since it may be called by the memory reclaimer, it should not > assume that the original address_space mapping still exists, and > it should not block. > > Hugh Thanks for pointing this out! I think I can make do without this patch, by storing the direct map state in some bit directly on the folio (in yesterday's upstream guest_memfd call, we talked about using ->private, which guest_memfd isn't using for anything yet). Will do that for the next iteration.
Best, Patrick