On Wed, Apr 29, 2026 at 11:35:36AM -0400, Zi Yan wrote:
>collapse_file() is capable of collapsing pagecache folios from writable
>files to PMD folios. Now enable clean pagecache folio collapse in addition
>to read-only pagecache folio collapse by removing the
>inode_is_open_for_write() from file_thp_enabled() and only performing
>filemap_flush() if the file is read-only.
>
>This means userspace needs to explicitly flush the content of pagecache
>folios before khugepaged can collapse the folios, or use
>madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is
>that blindly enabling dirty pagecache folio from writable files collapse
>makes khugepaged flush these folios all the time. It is undesirable to
>cause system level pagecache flushes.
>
>To properly support dirty pagecache folio collapse, filemap_flush() needs
>to be avoided. Potentially, merging associated buffer instead of dropping
>it with filemap_release_folio() might be needed.
>
>NOTE: this breaks khugepaged selftests for writable file pagecache
>collapse, which is set to fail all the time. The next commit fix it.
>
>Signed-off-by: Zi Yan <[email protected]>
>---
> mm/huge_memory.c | 2 +-
> mm/khugepaged.c  | 9 ++++++++-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 9b3abb98a7e51..e1e9d59db6e70 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -97,7 +97,7 @@ static inline bool file_thp_enabled(struct vm_area_struct 
>*vma)
>       if (!mapping_pmd_folio_support(vma->vm_file->f_mapping))
>               return false;
> 
>-      return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>+      return S_ISREG(inode->i_mode);
> }
> 
> /* If returns true, we are unable to access the VMA's folios. */
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 1ee15b48962a3..fb7ff643973cc 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -2345,7 +2345,14 @@ static enum scan_result collapse_file(struct mm_struct 
>*mm, unsigned long addr,
>                                * forcing writeback in loop.
>                                */

Nit: the comment above now looks stale. It still says:

"There won’t be new dirty pages."

That was true when file_thp_enabled() rejected writable-open files, but
not after this patch ;)

Otherwise, LGTM.
Reviewed-by: Lance Yang <[email protected]>

>                               xas_unlock_irq(&xas);
>-                              filemap_flush(mapping);
>+                              /*
>+                               * Only flush for read-only files. Writable
>+                               * files can have their folios dirty at any
>+                               * time; blindly flushing them would cause
>+                               * undesirable system-wide writeback.
>+                               */
>+                              if (!inode_is_open_for_write(mapping->host))
>+                                      filemap_flush(mapping);
>                               result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>                               goto xa_unlocked;
>                       } else if (folio_test_writeback(folio)) {
>-- 
>2.53.0
>
>

Reply via email to