memcg: limit page cache in memcg hack

Alexander Mikhalitsyn Tue, 19 Oct 2021 07:52:48 -0700

From: Andrey Ryabinin <aryabi...@virtuozzo.com>

Add new memcg file - memory.cache.limit_in_bytes.
Used to limit page cache usage in cgroup.


https://jira.sw.ru/browse/PSBM-77547

Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com>

khorenko@: usecase:
imagine a system service which anon memory you don't want to limit
(in our case it's a vStorage cgroup which hosts CSes and MDSes, they can
consume memory in some range and we don't want to set a limit for max possible
consumption - too high, and we don't know the number of CSes on the node -
admin can add CSes dynamically. And we don't want to dynamically
increase/decrease the limit).

If the cgroup is "unlimited" it produces permanent memory pressure on the node
because it generates a lot of pagecache and other cgroups on the node are
affected (even taking into account the fact of proportional fair reclaim).

=> solution is to limit pagecache only, so this is implemented.

Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com>

(cherry picked from commit da9151c891819733762a178b4efd7e44766fb8b1)

Reworked:
now we have no charge/cancel/commit/uncharge memcg API (we only have 
charge/uncharge)
=> we have to track pages which was charged as page cache => additional flag 
was introduced
which implemented using mm/page_ext.c subsystem (see mm/page_vzext.c)

See ms commits:
0d1c2072 ("mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM 
counters")
3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")

https://jira.sw.ru/browse/PSBM-131957

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com>

khorenko@:
v2:
  1. hunk
  ===
    done_restock:
   +       if (cache_charge)
   +               page_counter_charge(&memcg->cache, batch);
   +
  ===

  is moved to later commit ("mm/memcg: Use per-cpu stock charges for
  ->cache counter")

  2. "cache" field in struct mem_cgroup has been moved out of ifdef
  3. copyright added to include/linux/page_vzext.h

v3: define mem_cgroup_charge_cache() for !CONFIG_MEMCG case
(cherry picked from commit 923c3f6d0c71499affd6fe2741aa7e2dcc565efa)

===+++
mm/memcg: reclaim memory.cache.limit_in_bytes from background

Reclaiming memory above memory.cache.limit_in_bytes always in direct
reclaim mode adds to much of a cost for vstorage. Instead of direct
reclaim allow to overflow memory.cache.limit_in_bytes but launch
the reclaim in background task.

https://pmc.acronis.com/browse/VSTOR-24395
https://jira.sw.ru/browse/PSBM-94761
Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com>

(cherry picked from commit c7235680e58c0d7d792e8f47264ef233d2752b0b)

see ms 1a3e1f40 ("mm: memcontrol: decouple reference counting from page 
accounting")

https://jira.sw.ru/browse/PSBM-131957

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com>

===+++
mm/memcg: fix cache growth above cache.limit_in_bytes

Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.

https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com>

(cherry picked from commit 098f6a9add74a10848494427046cb8087ceb27d1)

https://jira.sw.ru/browse/PSBM-131957

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com>

===+++
mm/memcg: Use per-cpu stock charges for ->cache counter

Currently we use per-cpu stocks to do precharges of the ->memory and ->memsw
counters. Do this for the ->kmem and ->cache as well to decrease contention
on these counters as well.

https://jira.sw.ru/browse/PSBM-101300
Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com>

(cherry picked from commit e1ae7b88d380d24a6df7c9b34635346726de39e3)

Original title:
mm/memcg: Use per-cpu stock charges for ->kmem and ->cache counters #PSBM-101300

Reworked:
kmem part was dropped because looks like this percpu charging functionallity
was covered by ms commit (see below).

see ms:
bf4f0599 ("mm: memcg/slab: obj_cgroup API")
e1a366be ("mm: memcontrol: switch to rcu protection in drain_all_stock()")
1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting")

https://jira.sw.ru/browse/PSBM-131957

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com>
===+++

Reworked @amikhalitsyn:

1. Combined all fuxups
120d68a2a mm/memcg: Use per-cpu stock charges for ->cache counter
3cc18f4f2 mm/memcg: fix cache growth above cache.limit_in_bytes
83677c3a3 mm/memcg: reclaim memory.cache.limit_in_bytes from background

to simplify feature porting in the future

2. added new RO file "memory.cache.usage_in_bytes" which allows to check
how many page cache was charged

See also:
18b2db3b03 ("mm: Convert page kmemcg type to a page memcg flag")

TODO for @amikhalitsyn:
take a look on "enum page_memcg_data_flags". It's worth to try use it as
a storage for "page is page cache" flag instead of using external page 
extensions.

===================================
Simple test:

dd if=/dev/random of=testfile.bin bs=1M count=1000
mkdir /sys/fs/cgroup/memory/pagecache_limiter
tee /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.limit_in_bytes <<< 
$[2**24]
bash
echo $$ > /sys/fs/cgroup/memory/pagecache_limiter/tasks
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
time wc -l testfile.bin
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
echo 3 > /proc/sys/vm/drop_caches
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
===================================

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com>
---
 include/linux/memcontrol.h   |   9 ++
 include/linux/page_vzflags.h |  37 ++++++
 mm/filemap.c                 |   2 +-
 mm/memcontrol.c              | 249 ++++++++++++++++++++++++++++-------
 4 files changed, 250 insertions(+), 47 deletions(-)
 create mode 100644 include/linux/page_vzflags.h

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d56d77da80f9..7b07e3d01c14 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -255,6 +255,7 @@ struct mem_cgroup {
        /* Legacy consumer-oriented counters */
        struct page_counter kmem;               /* v1 only */
        struct page_counter tcpmem;             /* v1 only */
+       struct page_counter cache;
 
        /* Range enforcement for interrupt charges */
        struct work_struct high_work;
@@ -716,6 +717,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup 
*memcg)
 }
 
 int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask);
+int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm,
+                           gfp_t gfp_mask);
 int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm,
                                  gfp_t gfp, swp_entry_t entry);
 void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry);
@@ -1246,6 +1249,12 @@ static inline int mem_cgroup_charge(struct page *page, 
struct mm_struct *mm,
        return 0;
 }
 
+static inline int mem_cgroup_charge_cache(struct page *page, struct mm_struct 
*mm,
+                                         gfp_t gfp_mask)
+{
+       return 0;
+}
+
 static inline int mem_cgroup_swapin_charge_page(struct page *page,
                        struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
 {
diff --git a/include/linux/page_vzflags.h b/include/linux/page_vzflags.h
new file mode 100644
index 000000000000..d98e4ac619a7
--- /dev/null
+++ b/include/linux/page_vzflags.h
@@ -0,0 +1,37 @@
+/*
+ *  include/linux/page_vzflags.h
+ *
+ *  Copyright (c) 2021 Virtuozzo International GmbH. All rights reserved.
+ *
+ */
+
+#ifndef __LINUX_PAGE_VZFLAGS_H
+#define __LINUX_PAGE_VZFLAGS_H
+
+#include <linux/page_vzext.h>
+#include <linux/page-flags.h>
+
+enum vzpageflags {
+       PGVZ_pagecache,
+};
+
+#define TESTVZPAGEFLAG(uname, lname)                           \
+static __always_inline int PageVz##uname(struct page *page)            \
+       { return get_page_vzext(page) && test_bit(PGVZ_##lname, 
&get_page_vzext(page)->vzflags); }
+
+#define SETVZPAGEFLAG(uname, lname)                            \
+static __always_inline void SetVzPage##uname(struct page *page)                
\
+       { if (get_page_vzext(page)) set_bit(PGVZ_##lname, 
&get_page_vzext(page)->vzflags); }
+
+#define CLEARVZPAGEFLAG(uname, lname)                          \
+static __always_inline void ClearVzPage##uname(struct page *page)              
\
+       { if (get_page_vzext(page)) clear_bit(PGVZ_##lname, 
&get_page_vzext(page)->vzflags); }
+
+#define VZPAGEFLAG(uname, lname)                                       \
+       TESTVZPAGEFLAG(uname, lname)                            \
+       SETVZPAGEFLAG(uname, lname)                             \
+       CLEARVZPAGEFLAG(uname, lname)
+
+VZPAGEFLAG(PageCache, pagecache)
+
+#endif /* __LINUX_PAGE_VZFLAGS_H */
diff --git a/mm/filemap.c b/mm/filemap.c
index a5cedb2bce8b..34fb79766902 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -874,7 +874,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
        page->index = offset;
 
        if (!huge) {
-               error = mem_cgroup_charge(page, NULL, gfp);
+               error = mem_cgroup_charge_cache(page, NULL, gfp);
                if (error)
                        goto error;
                charged = true;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 995e41ab3227..89ead3df0b59 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -36,6 +36,7 @@
 #include <linux/vm_event_item.h>
 #include <linux/smp.h>
 #include <linux/page-flags.h>
+#include <linux/page_vzflags.h>
 #include <linux/backing-dev.h>
 #include <linux/bit_spinlock.h>
 #include <linux/rcupdate.h>
@@ -215,6 +216,7 @@ enum res_type {
        _OOM_TYPE,
        _KMEM,
        _TCP,
+       _CACHE,
 };
 
 #define MEMFILE_PRIVATE(x, val)        ((x) << 16 | (val))
@@ -2158,6 +2160,7 @@ struct memcg_stock_pcp {
        struct obj_stock task_obj;
        struct obj_stock irq_obj;
 
+       unsigned int cache_nr_pages;
        struct work_struct work;
        unsigned long flags;
 #define FLUSHING_CACHED_CHARGE 0
@@ -2227,7 +2230,8 @@ static inline void put_obj_stock(unsigned long flags)
  *
  * returns true if successful, false otherwise.
  */
-static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+                         bool cache)
 {
        struct memcg_stock_pcp *stock;
        unsigned long flags;
@@ -2239,9 +2243,16 @@ static bool consume_stock(struct mem_cgroup *memcg, 
unsigned int nr_pages)
        local_irq_save(flags);
 
        stock = this_cpu_ptr(&memcg_stock);
-       if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
-               stock->nr_pages -= nr_pages;
-               ret = true;
+       if (memcg == stock->cached) {
+               if (cache && stock->cache_nr_pages >= nr_pages) {
+                       stock->cache_nr_pages -= nr_pages;
+                       ret = true;
+               }
+
+               if (!cache && stock->nr_pages >= nr_pages) {
+                       stock->nr_pages -= nr_pages;
+                       ret = true;
+               }
        }
 
        local_irq_restore(flags);
@@ -2255,15 +2266,20 @@ static bool consume_stock(struct mem_cgroup *memcg, 
unsigned int nr_pages)
 static void drain_stock(struct memcg_stock_pcp *stock)
 {
        struct mem_cgroup *old = stock->cached;
+       unsigned long nr_pages = stock->nr_pages + stock->cache_nr_pages;
 
        if (!old)
                return;
 
-       if (stock->nr_pages) {
-               page_counter_uncharge(&old->memory, stock->nr_pages);
+       if (stock->cache_nr_pages)
+               page_counter_uncharge(&old->cache, stock->cache_nr_pages);
+
+       if (nr_pages) {
+               page_counter_uncharge(&old->memory, nr_pages);
                if (do_memsw_account())
-                       page_counter_uncharge(&old->memsw, stock->nr_pages);
+                       page_counter_uncharge(&old->memsw, nr_pages);
                stock->nr_pages = 0;
+               stock->cache_nr_pages = 0;
        }
 
        css_put(&old->css);
@@ -2295,10 +2311,12 @@ static void drain_local_stock(struct work_struct *dummy)
  * Cache charges(val) to local per_cpu area.
  * This will be consumed by consume_stock() function, later.
  */
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+                        bool cache)
 {
        struct memcg_stock_pcp *stock;
        unsigned long flags;
+       unsigned long stock_nr_pages;
 
        local_irq_save(flags);
 
@@ -2308,9 +2326,14 @@ static void refill_stock(struct mem_cgroup *memcg, 
unsigned int nr_pages)
                css_get(&memcg->css);
                stock->cached = memcg;
        }
-       stock->nr_pages += nr_pages;
 
-       if (stock->nr_pages > MEMCG_CHARGE_BATCH)
+       if (cache)
+               stock->cache_nr_pages += nr_pages;
+       else
+               stock->nr_pages += nr_pages;
+
+       stock_nr_pages = stock->nr_pages + stock->cache_nr_pages;
+       if (stock_nr_pages > MEMCG_CHARGE_BATCH)
                drain_stock(stock);
 
        local_irq_restore(flags);
@@ -2338,10 +2361,12 @@ static void drain_all_stock(struct mem_cgroup 
*root_memcg)
                struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
                struct mem_cgroup *memcg;
                bool flush = false;
+               unsigned long nr_pages = stock->nr_pages +
+                                        stock->cache_nr_pages;
 
                rcu_read_lock();
                memcg = stock->cached;
-               if (memcg && stock->nr_pages &&
+               if (memcg && nr_pages &&
                    mem_cgroup_is_descendant(memcg, root_memcg))
                        flush = true;
                if (obj_stock_flush_required(stock, root_memcg))
@@ -2405,17 +2430,27 @@ static unsigned long reclaim_high(struct mem_cgroup 
*memcg,
 
        do {
                unsigned long pflags;
+               long cache_overused;
 
-               if (page_counter_read(&memcg->memory) <=
-                   READ_ONCE(memcg->memory.high))
-                       continue;
+               if (page_counter_read(&memcg->memory) >
+                   READ_ONCE(memcg->memory.high)) {
+                       memcg_memory_event(memcg, MEMCG_HIGH);
+
+                       psi_memstall_enter(&pflags);
+                       nr_reclaimed += try_to_free_mem_cgroup_pages(memcg,
+                                       nr_pages, gfp_mask, true);
+                       psi_memstall_leave(&pflags);
+               }
 
-               memcg_memory_event(memcg, MEMCG_HIGH);
+               cache_overused = page_counter_read(&memcg->cache) -
+                                memcg->cache.max;
 
-               psi_memstall_enter(&pflags);
-               nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages,
-                                                            gfp_mask, true);
-               psi_memstall_leave(&pflags);
+               if (cache_overused > 0) {
+                       psi_memstall_enter(&pflags);
+                       nr_reclaimed += try_to_free_mem_cgroup_pages(memcg,
+                                       cache_overused, gfp_mask, false);
+                       psi_memstall_leave(&pflags);
+               }
        } while ((memcg = parent_mem_cgroup(memcg)) &&
                 !mem_cgroup_is_root(memcg));
 
@@ -2651,7 +2686,7 @@ void mem_cgroup_handle_over_high(void)
 }
 
 static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
-                       unsigned int nr_pages)
+                           unsigned int nr_pages, bool cache_charge)
 {
        unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages);
        int nr_retries = MAX_RECLAIM_RETRIES;
@@ -2664,8 +2699,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, 
gfp_t gfp_mask,
        unsigned long pflags;
 
 retry:
-       if (consume_stock(memcg, nr_pages))
-               return 0;
+       if (consume_stock(memcg, nr_pages, cache_charge))
+               goto done;
 
        if (!do_memsw_account() ||
            page_counter_try_charge(&memcg->memsw, batch, &counter)) {
@@ -2790,13 +2825,19 @@ static int try_charge_memcg(struct mem_cgroup *memcg, 
gfp_t gfp_mask,
        page_counter_charge(&memcg->memory, nr_pages);
        if (do_memsw_account())
                page_counter_charge(&memcg->memsw, nr_pages);
+       if (cache_charge)
+               page_counter_charge(&memcg->cache, nr_pages);
 
        return 0;
 
 done_restock:
+       if (cache_charge)
+               page_counter_charge(&memcg->cache, batch);
+
        if (batch > nr_pages)
-               refill_stock(memcg, batch - nr_pages);
+               refill_stock(memcg, batch - nr_pages, cache_charge);
 
+done:
        /*
         * If the hierarchy is above the normal consumption range, schedule
         * reclaim on returning to userland.  We can perform reclaim here
@@ -2836,6 +2877,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, 
gfp_t gfp_mask,
                        current->memcg_nr_pages_over_high += batch;
                        set_notify_resume(current);
                        break;
+               } else if (page_counter_read(&memcg->cache) > memcg->cache.max) 
{
+                       if (!work_pending(&memcg->high_work))
+                               schedule_work(&memcg->high_work);
                }
        } while ((memcg = parent_mem_cgroup(memcg)));
 
@@ -2843,12 +2887,12 @@ static int try_charge_memcg(struct mem_cgroup *memcg, 
gfp_t gfp_mask,
 }
 
 static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
-                            unsigned int nr_pages)
+                            unsigned int nr_pages, bool cache_charge)
 {
        if (mem_cgroup_is_root(memcg))
                return 0;
 
-       return try_charge_memcg(memcg, gfp_mask, nr_pages);
+       return try_charge_memcg(memcg, gfp_mask, nr_pages, cache_charge);
 }
 
 #if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MMU)
@@ -3064,7 +3108,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup 
*objcg,
 
        if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
                page_counter_uncharge(&memcg->kmem, nr_pages);
-       refill_stock(memcg, nr_pages);
+       refill_stock(memcg, nr_pages, false);
 
        css_put(&memcg->css);
 }
@@ -3086,7 +3130,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup 
*objcg, gfp_t gfp,
 
        memcg = get_mem_cgroup_from_objcg(objcg);
 
-       ret = try_charge_memcg(memcg, gfp, nr_pages);
+       ret = try_charge_memcg(memcg, gfp, nr_pages, false);
        if (ret)
                goto out;
 
@@ -3384,7 +3428,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp,
 {
        int ret = 0;
 
-       ret = try_charge(memcg, gfp, nr_pages);
+       ret = try_charge(memcg, gfp, nr_pages, false);
        if (!ret)
                page_counter_charge(&memcg->kmem, nr_pages);
 
@@ -3743,6 +3787,9 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state 
*css,
        case _TCP:
                counter = &memcg->tcpmem;
                break;
+       case _CACHE:
+               counter = &memcg->cache;
+               break;
        default:
                BUG();
        }
@@ -3905,6 +3952,43 @@ static int memcg_update_tcp_max(struct mem_cgroup 
*memcg, unsigned long max)
        return ret;
 }
 
+static int memcg_update_cache_max(struct mem_cgroup *memcg,
+                                 unsigned long limit)
+{
+       unsigned long nr_pages;
+       bool enlarge = false;
+       int ret;
+
+       do {
+               if (signal_pending(current)) {
+                       ret = -EINTR;
+                       break;
+               }
+               mutex_lock(&memcg_max_mutex);
+
+               if (limit > memcg->cache.max)
+                       enlarge = true;
+
+               ret = page_counter_set_max(&memcg->cache, limit);
+               mutex_unlock(&memcg_max_mutex);
+
+               if (!ret)
+                       break;
+
+               nr_pages = max_t(long, 1, page_counter_read(&memcg->cache) - 
limit);
+               if (!try_to_free_mem_cgroup_pages(memcg, nr_pages,
+                                               GFP_KERNEL, false)) {
+                       ret = -EBUSY;
+                       break;
+               }
+       } while (1);
+
+       if (!ret && enlarge)
+               memcg_oom_recover(memcg);
+
+       return ret;
+}
+
 /*
  * The user of this function is...
  * RES_LIMIT.
@@ -3943,6 +4027,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file 
*of,
                case _TCP:
                        ret = memcg_update_tcp_max(memcg, nr_pages);
                        break;
+               case _CACHE:
+                       ret = memcg_update_cache_max(memcg, nr_pages);
+                       break;
                }
                break;
        case RES_SOFT_LIMIT:
@@ -3972,6 +4059,9 @@ static ssize_t mem_cgroup_reset(struct kernfs_open_file 
*of, char *buf,
        case _TCP:
                counter = &memcg->tcpmem;
                break;
+       case _CACHE:
+               counter = &memcg->cache;
+               break;
        default:
                BUG();
        }
@@ -5594,6 +5684,17 @@ static struct cftype mem_cgroup_legacy_files[] = {
        {
                .name = "pressure_level",
        },
+       {
+               .name = "cache.limit_in_bytes",
+               .private = MEMFILE_PRIVATE(_CACHE, RES_LIMIT),
+               .write = mem_cgroup_write,
+               .read_u64 = mem_cgroup_read_u64,
+       },
+       {
+               .name = "cache.usage_in_bytes",
+               .private = MEMFILE_PRIVATE(_CACHE, RES_USAGE),
+               .read_u64 = mem_cgroup_read_u64,
+       },
 #ifdef CONFIG_NUMA
        {
                .name = "numa_stat",
@@ -5907,11 +6008,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
                page_counter_init(&memcg->swap, &parent->swap);
                page_counter_init(&memcg->kmem, &parent->kmem);
                page_counter_init(&memcg->tcpmem, &parent->tcpmem);
+               page_counter_init(&memcg->cache, &parent->cache);
        } else {
                page_counter_init(&memcg->memory, NULL);
                page_counter_init(&memcg->swap, NULL);
                page_counter_init(&memcg->kmem, NULL);
                page_counter_init(&memcg->tcpmem, NULL);
+               page_counter_init(&memcg->cache, NULL);
 
                root_mem_cgroup = memcg;
                return &memcg->css;
@@ -6032,6 +6135,7 @@ static void mem_cgroup_css_reset(struct 
cgroup_subsys_state *css)
        page_counter_set_max(&memcg->swap, PAGE_COUNTER_MAX);
        page_counter_set_max(&memcg->kmem, PAGE_COUNTER_MAX);
        page_counter_set_max(&memcg->tcpmem, PAGE_COUNTER_MAX);
+       page_counter_set_max(&memcg->cache, PAGE_COUNTER_MAX);
        page_counter_set_min(&memcg->memory, 0);
        page_counter_set_low(&memcg->memory, 0);
        page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX);
@@ -6103,7 +6207,8 @@ static int mem_cgroup_do_precharge(unsigned long count)
        int ret;
 
        /* Try a single bulk charge without reclaim first, kswapd may wake */
-       ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count);
+       ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count,
+                        false);
        if (!ret) {
                mc.precharge += count;
                return ret;
@@ -6111,7 +6216,7 @@ static int mem_cgroup_do_precharge(unsigned long count)
 
        /* Try charges one by one with reclaim, but do not retry */
        while (count--) {
-               ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
+               ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1, false);
                if (ret)
                        return ret;
                mc.precharge++;
@@ -7333,18 +7438,30 @@ void mem_cgroup_calculate_protection(struct mem_cgroup 
*root,
 }
 
 static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg,
-                              gfp_t gfp)
+                              gfp_t gfp, bool cache_charge)
 {
        unsigned int nr_pages = thp_nr_pages(page);
        int ret;
 
-       ret = try_charge(memcg, gfp, nr_pages);
+       ret = try_charge(memcg, gfp, nr_pages, cache_charge);
        if (ret)
                goto out;
 
        css_get(&memcg->css);
        commit_charge(page, memcg);
 
+       /*
+        * Here we set extended flag (see page_vzflags.c)
+        * on page which indicates that page is charged as
+        * a "page cache" page.
+        *
+        * We always cleanup this flag on uncharging, it means
+        * that during charging page we shoudn't have this flag set.
+        */
+       BUG_ON(PageVzPageCache(page));
+       if (cache_charge)
+               SetVzPagePageCache(page);
+
        local_irq_disable();
        mem_cgroup_charge_statistics(memcg, page, nr_pages);
        memcg_check_events(memcg, page);
@@ -7353,6 +7470,22 @@ static int __mem_cgroup_charge(struct page *page, struct 
mem_cgroup *memcg,
        return ret;
 }
 
+static int __mem_cgroup_charge_gen(struct page *page, struct mm_struct *mm,
+                       gfp_t gfp_mask, bool cache_charge)
+{
+       struct mem_cgroup *memcg;
+       int ret;
+
+       if (mem_cgroup_disabled())
+               return 0;
+
+       memcg = get_mem_cgroup_from_mm(mm);
+       ret = __mem_cgroup_charge(page, memcg, gfp_mask, cache_charge);
+       css_put(&memcg->css);
+
+       return ret;
+}
+
 /**
  * mem_cgroup_charge - charge a newly allocated page to a cgroup
  * @page: page to charge
@@ -7369,17 +7502,12 @@ static int __mem_cgroup_charge(struct page *page, 
struct mem_cgroup *memcg,
  */
 int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 {
-       struct mem_cgroup *memcg;
-       int ret;
-
-       if (mem_cgroup_disabled())
-               return 0;
-
-       memcg = get_mem_cgroup_from_mm(mm);
-       ret = __mem_cgroup_charge(page, memcg, gfp_mask);
-       css_put(&memcg->css);
+       return __mem_cgroup_charge_gen(page, mm, gfp_mask, false);
+}
 
-       return ret;
+int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm, gfp_t 
gfp_mask)
+{
+       return __mem_cgroup_charge_gen(page, mm, gfp_mask, true);
 }
 
 /**
@@ -7411,7 +7539,7 @@ int mem_cgroup_swapin_charge_page(struct page *page, 
struct mm_struct *mm,
                memcg = get_mem_cgroup_from_mm(mm);
        rcu_read_unlock();
 
-       ret = __mem_cgroup_charge(page, memcg, gfp);
+       ret = __mem_cgroup_charge(page, memcg, gfp, false);
 
        css_put(&memcg->css);
        return ret;
@@ -7455,6 +7583,7 @@ struct uncharge_gather {
        unsigned long nr_memory;
        unsigned long pgpgout;
        unsigned long nr_kmem;
+       unsigned long nr_pgcache;
        struct page *dummy_page;
 };
 
@@ -7473,6 +7602,9 @@ static void uncharge_batch(const struct uncharge_gather 
*ug)
                        page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
                if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
                        page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
+               if (ug->nr_pgcache)
+                       page_counter_uncharge(&ug->memcg->cache, 
ug->nr_pgcache);
+
                memcg_oom_recover(ug->memcg);
        }
 
@@ -7535,6 +7667,16 @@ static void uncharge_page(struct page *page, struct 
uncharge_gather *ug)
                page->memcg_data = 0;
                obj_cgroup_put(objcg);
        } else {
+               if (PageVzPageCache(page)) {
+                       ug->nr_pgcache += nr_pages;
+                       /*
+                        * If we are here, it means that page *will* be
+                        * uncharged anyway. We can safely clean
+                        * "page is charged as a page cache" flag here.
+                        */
+                       ClearVzPagePageCache(page);
+               }
+
                /* LRU pages aren't accounted at the root level */
                if (!mem_cgroup_is_root(memcg))
                        ug->nr_memory += nr_pages;
@@ -7633,6 +7775,21 @@ void mem_cgroup_migrate(struct page *oldpage, struct 
page *newpage)
                        page_counter_charge(&memcg->memsw, nr_pages);
        }
 
+       /*
+        * copy_page_vzflags() called before mem_cgroup_migrate()
+        * in migrate_page_states (mm/migrate.c)
+        *
+        * Let's check that all fine with flags:
+        * from one point of view page cache pages is always
+        * not anonimous and not swap backed;
+        * from another point of view we must have
+        * PageVzPageCache(page) ext flag set.
+        */
+       WARN_ON((!PageAnon(newpage) && !PageSwapBacked(newpage)) !=
+               PageVzPageCache(newpage));
+       if (PageVzPageCache(newpage))
+               page_counter_charge(&memcg->cache, nr_pages);
+
        css_get(&memcg->css);
        commit_charge(newpage, memcg);
 
@@ -7704,10 +7861,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, 
unsigned int nr_pages)
 
        mod_memcg_state(memcg, MEMCG_SOCK, nr_pages);
 
-       if (try_charge(memcg, gfp_mask, nr_pages) == 0)
+       if (try_charge(memcg, gfp_mask, nr_pages, false) == 0)
                return true;
 
-       try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
+       try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages, false);
        return false;
 }
 
@@ -7725,7 +7882,7 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, 
unsigned int nr_pages)
 
        mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages);
 
-       refill_stock(memcg, nr_pages);
+       refill_stock(memcg, nr_pages, false);
 }
 
 static int __init cgroup_memory(char *s)
-- 
2.31.1

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RH9 3/3] mm/memcg: limit page cache in memcg hack

Reply via email to