From: Charan Teja Kalla <quic_chara...@quicinc.com>

Large folios occupy N consecutive entries in the swap cache instead of
using multi-index entries like the page cache.  However, if a large folio
is re-added to the LRU list, it can be migrated.  The migration code was
not aware of the difference between the swap cache and the page cache and
assumed that a single xas_store() would be sufficient.

This leaves potentially many stale pointers to the now-migrated folio in
the swap cache, which can lead to almost arbitrary data corruption in the
future.  This can also manifest as infinite loops with the RCU read lock
held.

[wi...@infradead.org: modifications to the changelog & tweaked the fix]
Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
Link: https://lkml.kernel.org/r/20231214045841.961776-1-wi...@infradead.org
Signed-off-by: Charan Teja Kalla <quic_chara...@quicinc.com>
Signed-off-by: Matthew Wilcox (Oracle) <wi...@infradead.org>
Reported-by: Charan Teja Kalla <quic_chara...@quicinc.com>
Closes: 
https://lkml.kernel.org/r/1700569840-17327-1-git-send-email-quic_chara...@quicinc.com
Cc: David Hildenbrand <da...@redhat.com>
Cc: Johannes Weiner <han...@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
Cc: Naoya Horiguchi <n-horigu...@ah.jp.nec.com>
Cc: Shakeel Butt <shake...@google.com>
Cc: <sta...@vger.kernel.org>
Signed-off-by: Andrew Morton <a...@linux-foundation.org>

We have a check in do_swap_page that page from lookup_swap_cache should
have PG_swapcache bit set, but these leftover stale pointers may be
reused by new folio without PG_swapcache bit, and that leads to infinite
loop in:

  +-> mmap_read_lock
    +-> __get_user_pages_locked
      +-> for-loop # taken once
        +-> __get_user_pages
          +-> retry-loop # constantly spinning
            +-> faultin_page # return 0 to trigger retry
              +-> handle_mm_fault
                +-> __handle_mm_fault
                  +-> handle_pte_fault
                    +-> do_swap_page
                      +-> lookup_swap_cache # returns non-NULL
                      +-> if (swapcache)
                        +-> if (!folio_test_swapcache || page_private(page) != 
entry.val)
                          +-> goto out_page
                            +-> return 0

(cherry picked from commit fc346d0a70a13d52fe1c4bc49516d83a42cd7c4c)
https://virtuozzo.atlassian.net/browse/PSBM-153264
Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com>
---
 mm/migrate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d36d945cf716..d950f42c0708 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -387,6 +387,7 @@ int folio_migrate_mapping(struct address_space *mapping,
        int dirty;
        int expected_count = folio_expected_refs(mapping, folio) + extra_count;
        long nr = folio_nr_pages(folio);
+       long entries, i;
 
        if (!mapping) {
                /* Anonymous page without mapping */
@@ -424,8 +425,10 @@ int folio_migrate_mapping(struct address_space *mapping,
                        folio_set_swapcache(newfolio);
                        newfolio->private = folio_get_private(folio);
                }
+               entries = nr;
        } else {
                VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+               entries = 1;
        }
 
        /* Move dirty while page refs frozen and newpage not yet exposed */
@@ -435,7 +438,11 @@ int folio_migrate_mapping(struct address_space *mapping,
                folio_set_dirty(newfolio);
        }
 
-       xas_store(&xas, newfolio);
+       /* Swap cache still stores N entries instead of a high-order entry */
+       for (i = 0; i < entries; i++) {
+               xas_store(&xas, newfolio);
+               xas_next(&xas);
+       }
 
        /*
         * Drop cache reference from old page by unfreezing
-- 
2.43.0

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to