[Devel] [PATCH RH8 11/11] mm: allow kmem limit bypassing if reclaimable slabs detected

Andrey Zhadchenko Fri, 04 Jun 2021 11:48:24 -0700

From: Konstantin Khorenko <khore...@virtuozzo.com>

If we generate a lot of kmem (dentries and inodes in particular)
we may hit cgroup kmem limit in GFP_NOFS context (e.g. in
ext4_alloc_inode()) and fail to free reclaimable inodes due to NOFS
context.


Detect reclaimable kmem on hitting the limit and allow to bypass the
limit - reclaim will happen on next kmem alloc in GFP_KERNEL context.

Honor "vm.vfs_cache_min_ratio" sysctl and don't bypass in case the
amount of reclaimable kmem is not enough.

https://jira.sw.ru/browse/PSBM-91566

Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com>

Rebased to vz8:
 - As EINTR logic and bypass mark is gone for try_charge we should
just force allocation
 - Use mem_page_state instead of obsolete mem_cgroup_read_stat2_fast

(cherry-picked from 1bbcb753b7f965b35c68312b11dfaa4ca65b9ed3)
Signed-off-by: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com>

diff --git a/fs/super.c b/fs/super.c
index 9fda135..c0d97ea 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -50,7 +50,7 @@
        "sb_internal",
 };
 
-static bool dcache_is_low(struct mem_cgroup *memcg)
+bool dcache_is_low(struct mem_cgroup *memcg)
 {
        unsigned long anon, file, dcache;
        int vfs_cache_min_ratio = READ_ONCE(sysctl_vfs_cache_min_ratio);
@@ -68,6 +68,7 @@ static bool dcache_is_low(struct mem_cgroup *memcg)
        return dcache / vfs_cache_min_ratio <
                        (anon + file + dcache) / 100;
 }
+EXPORT_SYMBOL(dcache_is_low);
 
 /*
  * One thing we have to be careful of with a per-sb shrinker is that we don't
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2d85414..05058ef 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2410,6 +2410,28 @@ void mem_cgroup_handle_over_high(void)
        current->memcg_nr_pages_over_high = 0;
 }
 
+extern bool dcache_is_low(struct mem_cgroup *memcg);
+/*
+ * Do we have anything to reclaim in memcg kmem?
+ * Have to honor vfs_cache_min_ratio here because if dcache_is_low()
+ * we won't reclaim dcache at all in do_shrink_slab().
+ */
+static bool kmem_reclaim_is_low(struct mem_cgroup *memcg)
+{
+#define        KMEM_RECLAIM_LOW_MARK   32
+
+       unsigned long dcache;
+       int vfs_cache_min_ratio = READ_ONCE(sysctl_vfs_cache_min_ratio);
+
+       if (vfs_cache_min_ratio <= 0) {
+               dcache = memcg_page_state(memcg, NR_SLAB_RECLAIMABLE);
+
+               return dcache < KMEM_RECLAIM_LOW_MARK;
+       }
+
+       return dcache_is_low(memcg);
+}
+
 static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, bool 
kmem_charge,
                      unsigned int nr_pages, bool cache_charge)
 {
@@ -2543,6 +2565,16 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask, bool kmem_charge
                goto force;
 
        /*
+        * We might have [a lot of] reclaimable kmem which we cannot reclaim in
+        * the current context, e.g. lot of inodes/dentries while tring to get
+        * allocate kmem for new inode with GFP_NOFS.
+        * Thus overcharge kmem now, it will be reclaimed on next allocation in
+        * usual GFP_KERNEL context.
+        */
+       if (kmem_limit && !kmem_reclaim_is_low(mem_over_limit))
+               goto force;
+
+       /*
         * keep retrying as long as the memcg oom killer is able to make
         * a forward progress or bypass the charge if the oom killer
         * couldn't make any progress.
-- 
1.8.3.1

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RH8 11/11] mm: allow kmem limit bypassing if reclaimable slabs detected

Reply via email to