From: Konstantin Khorenko <khore...@virtuozzo.com> If we generate a lot of kmem (dentries and inodes in particular) we may hit cgroup kmem limit in GFP_NOFS context (e.g. in ext4_alloc_inode()) and fail to free reclaimable inodes due to NOFS context.
Detect reclaimable kmem on hitting the limit and allow to bypass the limit - reclaim will happen on next kmem alloc in GFP_KERNEL context. Honor "vm.vfs_cache_min_ratio" sysctl and don't bypass in case the amount of reclaimable kmem is not enough. https://jira.sw.ru/browse/PSBM-91566 Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com> Rebased to vz8: - As EINTR logic and bypass mark is gone for try_charge we should just force allocation - Use mem_page_state instead of obsolete mem_cgroup_read_stat2_fast (cherry-picked from 1bbcb753b7f965b35c68312b11dfaa4ca65b9ed3) Signed-off-by: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com> diff --git a/fs/super.c b/fs/super.c index 9fda135..c0d97ea 100644 --- a/fs/super.c +++ b/fs/super.c @@ -50,7 +50,7 @@ "sb_internal", }; -static bool dcache_is_low(struct mem_cgroup *memcg) +bool dcache_is_low(struct mem_cgroup *memcg) { unsigned long anon, file, dcache; int vfs_cache_min_ratio = READ_ONCE(sysctl_vfs_cache_min_ratio); @@ -68,6 +68,7 @@ static bool dcache_is_low(struct mem_cgroup *memcg) return dcache / vfs_cache_min_ratio < (anon + file + dcache) / 100; } +EXPORT_SYMBOL(dcache_is_low); /* * One thing we have to be careful of with a per-sb shrinker is that we don't diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2d85414..05058ef 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2410,6 +2410,28 @@ void mem_cgroup_handle_over_high(void) current->memcg_nr_pages_over_high = 0; } +extern bool dcache_is_low(struct mem_cgroup *memcg); +/* + * Do we have anything to reclaim in memcg kmem? + * Have to honor vfs_cache_min_ratio here because if dcache_is_low() + * we won't reclaim dcache at all in do_shrink_slab(). + */ +static bool kmem_reclaim_is_low(struct mem_cgroup *memcg) +{ +#define KMEM_RECLAIM_LOW_MARK 32 + + unsigned long dcache; + int vfs_cache_min_ratio = READ_ONCE(sysctl_vfs_cache_min_ratio); + + if (vfs_cache_min_ratio <= 0) { + dcache = memcg_page_state(memcg, NR_SLAB_RECLAIMABLE); + + return dcache < KMEM_RECLAIM_LOW_MARK; + } + + return dcache_is_low(memcg); +} + static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, bool kmem_charge, unsigned int nr_pages, bool cache_charge) { @@ -2543,6 +2565,16 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, bool kmem_charge goto force; /* + * We might have [a lot of] reclaimable kmem which we cannot reclaim in + * the current context, e.g. lot of inodes/dentries while tring to get + * allocate kmem for new inode with GFP_NOFS. + * Thus overcharge kmem now, it will be reclaimed on next allocation in + * usual GFP_KERNEL context. + */ + if (kmem_limit && !kmem_reclaim_is_low(mem_over_limit)) + goto force; + + /* * keep retrying as long as the memcg oom killer is able to make * a forward progress or bypass the charge if the oom killer * couldn't make any progress. -- 1.8.3.1 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel