On 22/03/2024 12:58, Pavel Tikhomirov wrote:
The interface is slightly reworked to be more v2 like:

- rename memory.cache.limit/usage_in_bytes -> memory.cache.max/current
- show "max" when uninitialized and allow to write it
- memcg_max_mutex with page_counter_set_max replaced with simple xchg
- we set limit first before looping and then try to enforce it if
   needed, no more enforce before setting logic
- retry reclaim couple of times if it fails to enforce the limit and
   then just give up (memory_max_write triggers oom in this case, but we
   probably do not want to trigger oom due to cache limit)

https://virtuozzo.atlassian.net/browse/PSBM-154207
Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com>

Feature: mm: Memory cgroup page cache limit
---
  mm/memcontrol.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 75 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9000dc00ed03..28a39b50cbe1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -8810,3 +8810,78 @@ static int __init mem_cgroup_swap_init(void)
  subsys_initcall(mem_cgroup_swap_init);
#endif /* CONFIG_SWAP */
+
+static int cache_max_show(struct seq_file *m, void *v)
+{
+       return seq_puts_memcg_tunable(m,
+               READ_ONCE(mem_cgroup_from_seq(m)->cache.max));
+}
+
+static ssize_t cache_max_write(struct kernfs_open_file *of,
+                              char *buf, size_t nbytes, loff_t off)
+{
+       struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+       unsigned int nr_reclaims = MAX_RECLAIM_RETRIES;
+       unsigned long max;
+       int err;
+
+       buf = strstrip(buf);
+       err = page_counter_memparse(buf, "max", &max);
+       if (err)
+               return err;
+
+       xchg(&memcg->cache.max, max);

For history:

Here is my understanding (from discussion with Johannes Weiner <han...@cmpxchg.org>) of why we need xchg() in memory_max_write:

In page_counter_try_charge we do:

a) increase usage
-- implicit full memory barrier ---
b) check usage is within limit, else
c) revert usage

In page_counter_set_max we do:

A) check usage is within new limit, then
-- implicit full memory barrier ---
B) set new limit
-- implicit full memory barrier ---
C) check usage didn't grow under us, else
D) revert old limit

If at (b) we don't see concurrent limit change then at (C) we would see usage grow and vice versa.

And after switch to memory_max_write in cgroup-v2 code path (C) and (D) were replaced with:
C') check usage is within new limit, else
D') reclaim until is, or OOM if can't reach

So basically we need xchg() to sync with page_counter_try_charge.

For memcg->cache we don't have uses of page_counter_try_charge, so it should be safe to omit xchg() here (same for oom.guarantee).

But we also already have such unjustified xchg(), which is not strictly required, for swap.max in mainstream code, probably just copy-pasted from memory_max_write. And as it is not a fast-path we can just leave xchg() here and there unless we would really need to remove it for some reason.

+
+       for (;;) {
+               unsigned long nr_cache = page_counter_read(&memcg->cache);
+
+               if (nr_cache <= max)
+                       break;
+
+               if (signal_pending(current))
+                       break;
+
+               if (!nr_reclaims)
+                       break;
+
+               if (!try_to_free_mem_cgroup_pages(memcg, nr_cache - max,
+                   GFP_KERNEL, false))
+                       nr_reclaims--;
+       }
+
+       memcg_wb_domain_size_changed(memcg);
+       return nbytes;
+}
+
+static u64 cache_current_read(struct cgroup_subsys_state *css,
+                              struct cftype *cft)
+{
+       struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+       return (u64)page_counter_read(&memcg->cache) * PAGE_SIZE;
+}
+
+static struct cftype cache_files[] = {
+       {
+               .name = "cache.max",
+               .flags = CFTYPE_NOT_ON_ROOT,
+               .seq_show = cache_max_show,
+               .write = cache_max_write,
+       },
+       {
+               .name = "cache.current",
+               .flags = CFTYPE_NOT_ON_ROOT,
+               .read_u64 = cache_current_read,
+       },
+       { }     /* terminate */
+};
+
+static int __init mem_cgroup_cache_init(void)
+{
+       if (mem_cgroup_disabled())
+               return 0;
+
+       WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, cache_files));
+       return 0;
+}
+subsys_initcall(mem_cgroup_cache_init);

--
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to