Hi Hugh!

On Tue, 2025-09-16 at 07:23 +0100, Hugh Dickins wrote:> On Fri, 12 Sep 2025, 
Roy, Patrick wrote:
> 
>> From: Elliot Berman <quic_eber...@quicinc.com>
>>
>> When guest_memfd removes memory from the host kernel's direct map,
>> direct map entries must be restored before the memory is freed again. To
>> do so, ->free_folio() needs to know whether a gmem folio was direct map
>> removed in the first place though. While possible to keep track of this
>> information on each individual folio (e.g. via page flags), direct map
>> removal is an all-or-nothing property of the entire guest_memfd, so it
>> is less error prone to just check the flag stored in the gmem inode's
>> private data.  However, by the time ->free_folio() is called,
>> folio->mapping might be cleared. To still allow access to the address
>> space from which the folio was just removed, pass it in as an additional
>> argument to ->free_folio, as the mapping is well-known to all callers.
>>
>> Link: 
>> https://lore.kernel.org/all/15f665b4-2d33-41ca-ac50-fafe24ade...@redhat.com/
>> Suggested-by: David Hildenbrand <da...@redhat.com>
>> Acked-by: David Hildenbrand <da...@redhat.com>
>> Signed-off-by: Elliot Berman <quic_eber...@quicinc.com>
>> [patrick: rewrite shortlog for new usecase]
>> Signed-off-by: Patrick Roy <roy...@amazon.co.uk>
>> ---
>>  Documentation/filesystems/locking.rst |  2 +-
>>  fs/nfs/dir.c                          | 11 ++++++-----
>>  fs/orangefs/inode.c                   |  3 ++-
>>  include/linux/fs.h                    |  2 +-
>>  mm/filemap.c                          |  9 +++++----
>>  mm/secretmem.c                        |  3 ++-
>>  mm/vmscan.c                           |  4 ++--
>>  virt/kvm/guest_memfd.c                |  3 ++-
>>  8 files changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/filesystems/locking.rst 
>> b/Documentation/filesystems/locking.rst
>> index aa287ccdac2f..74c97287ec40 100644
>> --- a/Documentation/filesystems/locking.rst
>> +++ b/Documentation/filesystems/locking.rst
>> @@ -262,7 +262,7 @@ prototypes::
>>       sector_t (*bmap)(struct address_space *, sector_t);
>>       void (*invalidate_folio) (struct folio *, size_t start, size_t len);
>>       bool (*release_folio)(struct folio *, gfp_t);
>> -     void (*free_folio)(struct folio *);
>> +     void (*free_folio)(struct address_space *, struct folio *);
>>       int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
>>       int (*migrate_folio)(struct address_space *, struct folio *dst,
>>                       struct folio *src, enum migrate_mode);
> 
> Beware, that is against the intent of free_folio().
> 
> Since its 2.6.37 origin in 6072d13c4293 ("Call the filesystem back
> whenever a page is removed from the page cache"), freepage() or
> free_folio() has intentionally NOT taken a struct address_space *mapping,
> because that structure may already be freed by the time free_folio() is
> called, if the last folio holding it has now been freed.
> 
> Maybe something has changed since then, or maybe it happens to be safe
> just in the context in which you want to use it; but it is against the
> principle of free_folio().  (Maybe an rcu_read_lock() could be added
> in __remove_mapping() to make it safe nowadays? maybe not welcome.)
> 
> See Documentation/filesystems/vfs.rst:
> free_folio is called once the folio is no longer visible in the
> page cache in order to allow the cleanup of any private data.
> Since it may be called by the memory reclaimer, it should not
> assume that the original address_space mapping still exists, and
> it should not block.
> 
> Hugh
Thanks for pointing this out! I think I can make do without this patch,
by storing the direct map state in some bit directly on the folio (in
yesterday's upstream guest_memfd call, we talked about using
->private, which guest_memfd isn't using for anything yet). Will do that
for the next iteration.

Best,
Patrick


Reply via email to