From: Andrey Ryabinin <aryabi...@virtuozzo.com> Add new memcg file - memory.cache.limit_in_bytes. Used to limit page cache usage in cgroup.
https://jira.sw.ru/browse/PSBM-77547 Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> khorenko@: usecase: imagine a system service which anon memory you don't want to limit (in our case it's a vStorage cgroup which hosts CSes and MDSes, they can consume memory in some range and we don't want to set a limit for max possible consumption - too high, and we don't know the number of CSes on the node - admin can add CSes dynamically. And we don't want to dynamically increase/decrease the limit). If the cgroup is "unlimited" it produces permanent memory pressure on the node because it generates a lot of pagecache and other cgroups on the node are affected (even taking into account the fact of proportional fair reclaim). => solution is to limit pagecache only, so this is implemented. Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> (cherry picked from commit da9151c891819733762a178b4efd7e44766fb8b1) Reworked: now we have no charge/cancel/commit/uncharge memcg API (we only have charge/uncharge) => we have to track pages which was charged as page cache => additional flag was introduced which implemented using mm/page_ext.c subsystem (see mm/page_vzext.c) See ms commits: 0d1c2072 ("mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters") 3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") https://jira.sw.ru/browse/PSBM-131957 Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com> khorenko@: v2: 1. hunk === done_restock: + if (cache_charge) + page_counter_charge(&memcg->cache, batch); + === is moved to later commit ("mm/memcg: Use per-cpu stock charges for ->cache counter") 2. "cache" field in struct mem_cgroup has been moved out of ifdef 3. copyright added to include/linux/page_vzext.h v3: define mem_cgroup_charge_cache() for !CONFIG_MEMCG case (cherry picked from commit 923c3f6d0c71499affd6fe2741aa7e2dcc565efa) ===+++ mm/memcg: reclaim memory.cache.limit_in_bytes from background Reclaiming memory above memory.cache.limit_in_bytes always in direct reclaim mode adds to much of a cost for vstorage. Instead of direct reclaim allow to overflow memory.cache.limit_in_bytes but launch the reclaim in background task. https://pmc.acronis.com/browse/VSTOR-24395 https://jira.sw.ru/browse/PSBM-94761 Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> (cherry picked from commit c7235680e58c0d7d792e8f47264ef233d2752b0b) see ms 1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting") https://jira.sw.ru/browse/PSBM-131957 Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com> ===+++ mm/memcg: fix cache growth above cache.limit_in_bytes Exceeding cache above cache.limit_in_bytes schedules high_work_func() which tries to reclaim 32 pages. If cache generated fast enough or it allows cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim enough. Try to reclaim exceeded amount of cache instead. https://jira.sw.ru/browse/PSBM-106384 Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> (cherry picked from commit 098f6a9add74a10848494427046cb8087ceb27d1) https://jira.sw.ru/browse/PSBM-131957 Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com> ===+++ mm/memcg: Use per-cpu stock charges for ->cache counter Currently we use per-cpu stocks to do precharges of the ->memory and ->memsw counters. Do this for the ->kmem and ->cache as well to decrease contention on these counters as well. https://jira.sw.ru/browse/PSBM-101300 Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> (cherry picked from commit e1ae7b88d380d24a6df7c9b34635346726de39e3) Original title: mm/memcg: Use per-cpu stock charges for ->kmem and ->cache counters #PSBM-101300 Reworked: kmem part was dropped because looks like this percpu charging functionallity was covered by ms commit (see below). see ms: bf4f0599 ("mm: memcg/slab: obj_cgroup API") e1a366be ("mm: memcontrol: switch to rcu protection in drain_all_stock()") 1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting") https://jira.sw.ru/browse/PSBM-131957 Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com> ===+++ Reworked @amikhalitsyn: 1. Combined all fuxups 120d68a2a mm/memcg: Use per-cpu stock charges for ->cache counter 3cc18f4f2 mm/memcg: fix cache growth above cache.limit_in_bytes 83677c3a3 mm/memcg: reclaim memory.cache.limit_in_bytes from background to simplify feature porting in the future 2. added new RO file "memory.cache.usage_in_bytes" which allows to check how many page cache was charged See also: 18b2db3b03 ("mm: Convert page kmemcg type to a page memcg flag") TODO for @amikhalitsyn: take a look on "enum page_memcg_data_flags". It's worth to try use it as a storage for "page is page cache" flag instead of using external page extensions. =================================== Simple test: dd if=/dev/random of=testfile.bin bs=1M count=1000 mkdir /sys/fs/cgroup/memory/pagecache_limiter tee /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.limit_in_bytes <<< $[2**24] bash echo $$ > /sys/fs/cgroup/memory/pagecache_limiter/tasks cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes time wc -l testfile.bin cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes echo 3 > /proc/sys/vm/drop_caches cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes =================================== Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com> --- include/linux/memcontrol.h | 9 ++ include/linux/page_vzflags.h | 37 ++++++ mm/filemap.c | 2 +- mm/memcontrol.c | 249 ++++++++++++++++++++++++++++------- 4 files changed, 250 insertions(+), 47 deletions(-) create mode 100644 include/linux/page_vzflags.h diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d56d77da80f9..7b07e3d01c14 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -255,6 +255,7 @@ struct mem_cgroup { /* Legacy consumer-oriented counters */ struct page_counter kmem; /* v1 only */ struct page_counter tcpmem; /* v1 only */ + struct page_counter cache; /* Range enforcement for interrupt charges */ struct work_struct high_work; @@ -716,6 +717,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) } int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask); +int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm, + gfp_t gfp_mask); int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); @@ -1246,6 +1249,12 @@ static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm, return 0; } +static inline int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm, + gfp_t gfp_mask) +{ + return 0; +} + static inline int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { diff --git a/include/linux/page_vzflags.h b/include/linux/page_vzflags.h new file mode 100644 index 000000000000..d98e4ac619a7 --- /dev/null +++ b/include/linux/page_vzflags.h @@ -0,0 +1,37 @@ +/* + * include/linux/page_vzflags.h + * + * Copyright (c) 2021 Virtuozzo International GmbH. All rights reserved. + * + */ + +#ifndef __LINUX_PAGE_VZFLAGS_H +#define __LINUX_PAGE_VZFLAGS_H + +#include <linux/page_vzext.h> +#include <linux/page-flags.h> + +enum vzpageflags { + PGVZ_pagecache, +}; + +#define TESTVZPAGEFLAG(uname, lname) \ +static __always_inline int PageVz##uname(struct page *page) \ + { return get_page_vzext(page) && test_bit(PGVZ_##lname, &get_page_vzext(page)->vzflags); } + +#define SETVZPAGEFLAG(uname, lname) \ +static __always_inline void SetVzPage##uname(struct page *page) \ + { if (get_page_vzext(page)) set_bit(PGVZ_##lname, &get_page_vzext(page)->vzflags); } + +#define CLEARVZPAGEFLAG(uname, lname) \ +static __always_inline void ClearVzPage##uname(struct page *page) \ + { if (get_page_vzext(page)) clear_bit(PGVZ_##lname, &get_page_vzext(page)->vzflags); } + +#define VZPAGEFLAG(uname, lname) \ + TESTVZPAGEFLAG(uname, lname) \ + SETVZPAGEFLAG(uname, lname) \ + CLEARVZPAGEFLAG(uname, lname) + +VZPAGEFLAG(PageCache, pagecache) + +#endif /* __LINUX_PAGE_VZFLAGS_H */ diff --git a/mm/filemap.c b/mm/filemap.c index a5cedb2bce8b..34fb79766902 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -874,7 +874,7 @@ noinline int __add_to_page_cache_locked(struct page *page, page->index = offset; if (!huge) { - error = mem_cgroup_charge(page, NULL, gfp); + error = mem_cgroup_charge_cache(page, NULL, gfp); if (error) goto error; charged = true; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 995e41ab3227..89ead3df0b59 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -36,6 +36,7 @@ #include <linux/vm_event_item.h> #include <linux/smp.h> #include <linux/page-flags.h> +#include <linux/page_vzflags.h> #include <linux/backing-dev.h> #include <linux/bit_spinlock.h> #include <linux/rcupdate.h> @@ -215,6 +216,7 @@ enum res_type { _OOM_TYPE, _KMEM, _TCP, + _CACHE, }; #define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) @@ -2158,6 +2160,7 @@ struct memcg_stock_pcp { struct obj_stock task_obj; struct obj_stock irq_obj; + unsigned int cache_nr_pages; struct work_struct work; unsigned long flags; #define FLUSHING_CACHED_CHARGE 0 @@ -2227,7 +2230,8 @@ static inline void put_obj_stock(unsigned long flags) * * returns true if successful, false otherwise. */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, + bool cache) { struct memcg_stock_pcp *stock; unsigned long flags; @@ -2239,9 +2243,16 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) local_irq_save(flags); stock = this_cpu_ptr(&memcg_stock); - if (memcg == stock->cached && stock->nr_pages >= nr_pages) { - stock->nr_pages -= nr_pages; - ret = true; + if (memcg == stock->cached) { + if (cache && stock->cache_nr_pages >= nr_pages) { + stock->cache_nr_pages -= nr_pages; + ret = true; + } + + if (!cache && stock->nr_pages >= nr_pages) { + stock->nr_pages -= nr_pages; + ret = true; + } } local_irq_restore(flags); @@ -2255,15 +2266,20 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) static void drain_stock(struct memcg_stock_pcp *stock) { struct mem_cgroup *old = stock->cached; + unsigned long nr_pages = stock->nr_pages + stock->cache_nr_pages; if (!old) return; - if (stock->nr_pages) { - page_counter_uncharge(&old->memory, stock->nr_pages); + if (stock->cache_nr_pages) + page_counter_uncharge(&old->cache, stock->cache_nr_pages); + + if (nr_pages) { + page_counter_uncharge(&old->memory, nr_pages); if (do_memsw_account()) - page_counter_uncharge(&old->memsw, stock->nr_pages); + page_counter_uncharge(&old->memsw, nr_pages); stock->nr_pages = 0; + stock->cache_nr_pages = 0; } css_put(&old->css); @@ -2295,10 +2311,12 @@ static void drain_local_stock(struct work_struct *dummy) * Cache charges(val) to local per_cpu area. * This will be consumed by consume_stock() function, later. */ -static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages, + bool cache) { struct memcg_stock_pcp *stock; unsigned long flags; + unsigned long stock_nr_pages; local_irq_save(flags); @@ -2308,9 +2326,14 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) css_get(&memcg->css); stock->cached = memcg; } - stock->nr_pages += nr_pages; - if (stock->nr_pages > MEMCG_CHARGE_BATCH) + if (cache) + stock->cache_nr_pages += nr_pages; + else + stock->nr_pages += nr_pages; + + stock_nr_pages = stock->nr_pages + stock->cache_nr_pages; + if (stock_nr_pages > MEMCG_CHARGE_BATCH) drain_stock(stock); local_irq_restore(flags); @@ -2338,10 +2361,12 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *memcg; bool flush = false; + unsigned long nr_pages = stock->nr_pages + + stock->cache_nr_pages; rcu_read_lock(); memcg = stock->cached; - if (memcg && stock->nr_pages && + if (memcg && nr_pages && mem_cgroup_is_descendant(memcg, root_memcg)) flush = true; if (obj_stock_flush_required(stock, root_memcg)) @@ -2405,17 +2430,27 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, do { unsigned long pflags; + long cache_overused; - if (page_counter_read(&memcg->memory) <= - READ_ONCE(memcg->memory.high)) - continue; + if (page_counter_read(&memcg->memory) > + READ_ONCE(memcg->memory.high)) { + memcg_memory_event(memcg, MEMCG_HIGH); + + psi_memstall_enter(&pflags); + nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, + nr_pages, gfp_mask, true); + psi_memstall_leave(&pflags); + } - memcg_memory_event(memcg, MEMCG_HIGH); + cache_overused = page_counter_read(&memcg->cache) - + memcg->cache.max; - psi_memstall_enter(&pflags); - nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, - gfp_mask, true); - psi_memstall_leave(&pflags); + if (cache_overused > 0) { + psi_memstall_enter(&pflags); + nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, + cache_overused, gfp_mask, false); + psi_memstall_leave(&pflags); + } } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2651,7 +2686,7 @@ void mem_cgroup_handle_over_high(void) } static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) + unsigned int nr_pages, bool cache_charge) { unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries = MAX_RECLAIM_RETRIES; @@ -2664,8 +2699,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned long pflags; retry: - if (consume_stock(memcg, nr_pages)) - return 0; + if (consume_stock(memcg, nr_pages, cache_charge)) + goto done; if (!do_memsw_account() || page_counter_try_charge(&memcg->memsw, batch, &counter)) { @@ -2790,13 +2825,19 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); + if (cache_charge) + page_counter_charge(&memcg->cache, nr_pages); return 0; done_restock: + if (cache_charge) + page_counter_charge(&memcg->cache, batch); + if (batch > nr_pages) - refill_stock(memcg, batch - nr_pages); + refill_stock(memcg, batch - nr_pages, cache_charge); +done: /* * If the hierarchy is above the normal consumption range, schedule * reclaim on returning to userland. We can perform reclaim here @@ -2836,6 +2877,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, current->memcg_nr_pages_over_high += batch; set_notify_resume(current); break; + } else if (page_counter_read(&memcg->cache) > memcg->cache.max) { + if (!work_pending(&memcg->high_work)) + schedule_work(&memcg->high_work); } } while ((memcg = parent_mem_cgroup(memcg))); @@ -2843,12 +2887,12 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, } static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) + unsigned int nr_pages, bool cache_charge) { if (mem_cgroup_is_root(memcg)) return 0; - return try_charge_memcg(memcg, gfp_mask, nr_pages); + return try_charge_memcg(memcg, gfp_mask, nr_pages, cache_charge); } #if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MMU) @@ -3064,7 +3108,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) page_counter_uncharge(&memcg->kmem, nr_pages); - refill_stock(memcg, nr_pages); + refill_stock(memcg, nr_pages, false); css_put(&memcg->css); } @@ -3086,7 +3130,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, memcg = get_mem_cgroup_from_objcg(objcg); - ret = try_charge_memcg(memcg, gfp, nr_pages); + ret = try_charge_memcg(memcg, gfp, nr_pages, false); if (ret) goto out; @@ -3384,7 +3428,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, { int ret = 0; - ret = try_charge(memcg, gfp, nr_pages); + ret = try_charge(memcg, gfp, nr_pages, false); if (!ret) page_counter_charge(&memcg->kmem, nr_pages); @@ -3743,6 +3787,9 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, case _TCP: counter = &memcg->tcpmem; break; + case _CACHE: + counter = &memcg->cache; + break; default: BUG(); } @@ -3905,6 +3952,43 @@ static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max) return ret; } +static int memcg_update_cache_max(struct mem_cgroup *memcg, + unsigned long limit) +{ + unsigned long nr_pages; + bool enlarge = false; + int ret; + + do { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + mutex_lock(&memcg_max_mutex); + + if (limit > memcg->cache.max) + enlarge = true; + + ret = page_counter_set_max(&memcg->cache, limit); + mutex_unlock(&memcg_max_mutex); + + if (!ret) + break; + + nr_pages = max_t(long, 1, page_counter_read(&memcg->cache) - limit); + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages, + GFP_KERNEL, false)) { + ret = -EBUSY; + break; + } + } while (1); + + if (!ret && enlarge) + memcg_oom_recover(memcg); + + return ret; +} + /* * The user of this function is... * RES_LIMIT. @@ -3943,6 +4027,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, case _TCP: ret = memcg_update_tcp_max(memcg, nr_pages); break; + case _CACHE: + ret = memcg_update_cache_max(memcg, nr_pages); + break; } break; case RES_SOFT_LIMIT: @@ -3972,6 +4059,9 @@ static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, case _TCP: counter = &memcg->tcpmem; break; + case _CACHE: + counter = &memcg->cache; + break; default: BUG(); } @@ -5594,6 +5684,17 @@ static struct cftype mem_cgroup_legacy_files[] = { { .name = "pressure_level", }, + { + .name = "cache.limit_in_bytes", + .private = MEMFILE_PRIVATE(_CACHE, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "cache.usage_in_bytes", + .private = MEMFILE_PRIVATE(_CACHE, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, #ifdef CONFIG_NUMA { .name = "numa_stat", @@ -5907,11 +6008,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) page_counter_init(&memcg->swap, &parent->swap); page_counter_init(&memcg->kmem, &parent->kmem); page_counter_init(&memcg->tcpmem, &parent->tcpmem); + page_counter_init(&memcg->cache, &parent->cache); } else { page_counter_init(&memcg->memory, NULL); page_counter_init(&memcg->swap, NULL); page_counter_init(&memcg->kmem, NULL); page_counter_init(&memcg->tcpmem, NULL); + page_counter_init(&memcg->cache, NULL); root_mem_cgroup = memcg; return &memcg->css; @@ -6032,6 +6135,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys_state *css) page_counter_set_max(&memcg->swap, PAGE_COUNTER_MAX); page_counter_set_max(&memcg->kmem, PAGE_COUNTER_MAX); page_counter_set_max(&memcg->tcpmem, PAGE_COUNTER_MAX); + page_counter_set_max(&memcg->cache, PAGE_COUNTER_MAX); page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); @@ -6103,7 +6207,8 @@ static int mem_cgroup_do_precharge(unsigned long count) int ret; /* Try a single bulk charge without reclaim first, kswapd may wake */ - ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); + ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count, + false); if (!ret) { mc.precharge += count; return ret; @@ -6111,7 +6216,7 @@ static int mem_cgroup_do_precharge(unsigned long count) /* Try charges one by one with reclaim, but do not retry */ while (count--) { - ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); + ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1, false); if (ret) return ret; mc.precharge++; @@ -7333,18 +7438,30 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, } static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg, - gfp_t gfp) + gfp_t gfp, bool cache_charge) { unsigned int nr_pages = thp_nr_pages(page); int ret; - ret = try_charge(memcg, gfp, nr_pages); + ret = try_charge(memcg, gfp, nr_pages, cache_charge); if (ret) goto out; css_get(&memcg->css); commit_charge(page, memcg); + /* + * Here we set extended flag (see page_vzflags.c) + * on page which indicates that page is charged as + * a "page cache" page. + * + * We always cleanup this flag on uncharging, it means + * that during charging page we shoudn't have this flag set. + */ + BUG_ON(PageVzPageCache(page)); + if (cache_charge) + SetVzPagePageCache(page); + local_irq_disable(); mem_cgroup_charge_statistics(memcg, page, nr_pages); memcg_check_events(memcg, page); @@ -7353,6 +7470,22 @@ static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg, return ret; } +static int __mem_cgroup_charge_gen(struct page *page, struct mm_struct *mm, + gfp_t gfp_mask, bool cache_charge) +{ + struct mem_cgroup *memcg; + int ret; + + if (mem_cgroup_disabled()) + return 0; + + memcg = get_mem_cgroup_from_mm(mm); + ret = __mem_cgroup_charge(page, memcg, gfp_mask, cache_charge); + css_put(&memcg->css); + + return ret; +} + /** * mem_cgroup_charge - charge a newly allocated page to a cgroup * @page: page to charge @@ -7369,17 +7502,12 @@ static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg, */ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) { - struct mem_cgroup *memcg; - int ret; - - if (mem_cgroup_disabled()) - return 0; - - memcg = get_mem_cgroup_from_mm(mm); - ret = __mem_cgroup_charge(page, memcg, gfp_mask); - css_put(&memcg->css); + return __mem_cgroup_charge_gen(page, mm, gfp_mask, false); +} - return ret; +int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) +{ + return __mem_cgroup_charge_gen(page, mm, gfp_mask, true); } /** @@ -7411,7 +7539,7 @@ int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, memcg = get_mem_cgroup_from_mm(mm); rcu_read_unlock(); - ret = __mem_cgroup_charge(page, memcg, gfp); + ret = __mem_cgroup_charge(page, memcg, gfp, false); css_put(&memcg->css); return ret; @@ -7455,6 +7583,7 @@ struct uncharge_gather { unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; + unsigned long nr_pgcache; struct page *dummy_page; }; @@ -7473,6 +7602,9 @@ static void uncharge_batch(const struct uncharge_gather *ug) page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); + if (ug->nr_pgcache) + page_counter_uncharge(&ug->memcg->cache, ug->nr_pgcache); + memcg_oom_recover(ug->memcg); } @@ -7535,6 +7667,16 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) page->memcg_data = 0; obj_cgroup_put(objcg); } else { + if (PageVzPageCache(page)) { + ug->nr_pgcache += nr_pages; + /* + * If we are here, it means that page *will* be + * uncharged anyway. We can safely clean + * "page is charged as a page cache" flag here. + */ + ClearVzPagePageCache(page); + } + /* LRU pages aren't accounted at the root level */ if (!mem_cgroup_is_root(memcg)) ug->nr_memory += nr_pages; @@ -7633,6 +7775,21 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage) page_counter_charge(&memcg->memsw, nr_pages); } + /* + * copy_page_vzflags() called before mem_cgroup_migrate() + * in migrate_page_states (mm/migrate.c) + * + * Let's check that all fine with flags: + * from one point of view page cache pages is always + * not anonimous and not swap backed; + * from another point of view we must have + * PageVzPageCache(page) ext flag set. + */ + WARN_ON((!PageAnon(newpage) && !PageSwapBacked(newpage)) != + PageVzPageCache(newpage)); + if (PageVzPageCache(newpage)) + page_counter_charge(&memcg->cache, nr_pages); + css_get(&memcg->css); commit_charge(newpage, memcg); @@ -7704,10 +7861,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) mod_memcg_state(memcg, MEMCG_SOCK, nr_pages); - if (try_charge(memcg, gfp_mask, nr_pages) == 0) + if (try_charge(memcg, gfp_mask, nr_pages, false) == 0) return true; - try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages); + try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages, false); return false; } @@ -7725,7 +7882,7 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages); - refill_stock(memcg, nr_pages); + refill_stock(memcg, nr_pages, false); } static int __init cgroup_memory(char *s) -- 2.31.1 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel