[Devel] [PATCH RH9 01/23] mm/memcg: enable memory.low for cgroup v1

2021-09-26 Thread Vasily Averin
This adds memory.low for cgroup v1. Semantics the same as for cgroup v2. Signed-off-by: Andrey Ryabinin (cherry picked from commit 9ab48918d4587dc614595055e66e2e17af5c4a44) VvS: rebase to rh9 https://jira.sw.ru/browse/PSBM-133990 Signed-off-by: Vasily Averin --- mm/memcontrol.c | 10 ++

[Devel] [PATCH RH9 02/23] mm/memcg: enable memory.high for cgroup v1.

2021-09-26 Thread Vasily Averin
This adds memory.high file in cgroup v1. Semantics is the same as for cgroup v2. Signed-off-by: Andrey Ryabinin (cherry picked from commit c432767aac6f77c2d8ffad588da89d0a66c29249) VvS: rebase to rh9 https://jira.sw.ru/browse/PSBM-133990 Signed-off-by: Vasily Averin --- mm/memcontrol.c | 9

[Devel] [PATCH RH9 03/23] mm/vmalloc: add v[mz]alloc_account helpers

2021-09-26 Thread Vasily Averin
Same as v[mz]alloc, but accounted to kmemcg. Will be used later. Signed-off-by: Vladimir Davydov (cherry picked from commit 634f4e15e07b80d1d02404284da9d5ebce7f9a69) VvS: rebase to rh9 https://jira.sw.ru/browse/PSBM-133990 Signed-off-by: Vasily Averin --- include/linux/vmalloc.h | 2 ++ mm/vma

[Devel] [PATCH RH9 04/23] ms/memcg: enable accounting for net_device and Tx/Rx queues

2021-09-26 Thread Vasily Averin
Container netadmin can create a lot of fake net devices, then create a new net namespace and repeat it again and again. Net device can request the creation of up to 4096 tx and rx queues, and force kernel to allocate up to several tens of megabytes memory per net device. It makes sense to account

[Devel] [PATCH RH9 05/23] ms/memcg: enable accounting for IP address and routing-related objects

2021-09-26 Thread Vasily Averin
An netadmin inside container can use 'ip a a' and 'ip r a' to assign a large number of ipv4/ipv6 addresses and routing entries and force kernel to allocate megabytes of unaccounted memory for long-lived per-netdevice related kernel objects: 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_no

[Devel] [PATCH RH9 06/23] ms/memcg: enable accounting for inet_bin_bucket cache

2021-09-26 Thread Vasily Averin
net namespace can create up to 64K tcp and dccp ports and force kernel to allocate up to several megabytes of memory per netns for inet_bind_bucket objects. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. Signed-off-by: Vasily

[Devel] [PATCH RH9 07/23] ms/memcg: enable accounting for VLAN group array

2021-09-26 Thread Vasily Averin
vlan array consume up to 8 pages of memory per net device. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. Signed-off-by: Vasily Averin Signed-off-by: David S. Miller (cherry picked from commit a89893dd7b08fa85bcf643ca742ab38

[Devel] [PATCH RH9 08/23] ms/memcg: ipv6/sit: account and don't WARN on ip_tunnel_prl structs allocation

2021-09-26 Thread Vasily Averin
Author: Andrey Ryabinin The size of the ip_tunnel_prl structs allocation is controllable from user-space, thus it's better to avoid spam in dmesg if allocation failed. Also add __GFP_ACCOUNT as this is a good candidate for per-memcg accounting. Allocation is temporary and limited by 4GB. Signed-

[Devel] [PATCH RH9 09/23] ms/memcg: enable accounting for scm_fp_list objects

2021-09-26 Thread Vasily Averin
unix sockets allows to send file descriptors via SCM_RIGHTS type messages. Each such send call forces kernel to allocate up to 2Kb memory for struct scm_fp_list. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. Signed-off-by: Va

[Devel] [PATCH RH9 10/23] ms/memcg: enable accounting for pids in nested pid namespaces

2021-09-26 Thread Vasily Averin
Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg") enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep, but forgot to adjust the setting for nested pid namespaces. As a result, pid memory is not accounted exactly where it is really needed, inside memcg-li

[Devel] [PATCH RH9 11/23] ms/memcg: enable accounting for mnt_cache entries

2021-09-26 Thread Vasily Averin
Patch series "memcg accounting from OpenVZ", v7. OpenVZ uses memory accounting 20+ years since v2.2.x linux kernels. Initially we used our own accounting subsystem, then partially committed it to upstream, and a few years ago switched to cgroups v1. Now we're rebasing again, revising our old patc

[Devel] [PATCH RH9 12/23] ms/memcg: enable accounting for pollfd and select bits arrays

2021-09-26 Thread Vasily Averin
User can call select/poll system calls with a large number of assigned file descriptors and force kernel to allocate up to several pages of memory till end of these sleeping system calls. We have here long-living unaccounted per-task allocations. It makes sense to account for these allocations to

[Devel] [PATCH RH9 13/23] ms/memcg: enable accounting for file lock caches

2021-09-26 Thread Vasily Averin
User can create file locks for each open file and force kernel to allocate small but long-living objects per each open file. It makes sense to account for these objects to limit the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/b009f4c7-f0ab-c

[Devel] [PATCH RH9 15/23] ms/memcg: enable accounting for new namesapces and struct nsproxy

2021-09-26 Thread Vasily Averin
Container admin can create new namespaces and force kernel to allocate up to several pages of memory for the namespaces and its associated structures. Net and uts namespaces have enabled accounting for such allocations. It makes sense to account for rest ones to restrict the host's memory consump

[Devel] [PATCH RH9 14/23] ms/memcg: enable accounting for fasync_cache

2021-09-26 Thread Vasily Averin
fasync_struct is used by almost all character device drivers to set up the fasync queue, and for regular files by the file lease code. This structure is quite small but long-living and it can be assigned for any open file. It makes sense to account for its allocations to restrict the host's memor

[Devel] [PATCH RH9 16/23] ms/memcg: enable accounting of ipc resources

2021-09-26 Thread Vasily Averin
When user creates IPC objects it forces kernel to allocate memory for these long-living objects. It makes sense to account them to restrict the host's memory consumption from inside the memcg-limited container. This patch enables accounting for IPC shared memory segments, messages semaphores and

[Devel] [PATCH RH9 17/23] ms/memcg: enable accounting for signals

2021-09-26 Thread Vasily Averin
When a user send a signal to any another processes it forces the kernel to allocate memory for 'struct sigqueue' objects. The number of signals is limited by RLIMIT_SIGPENDING resource limit, but even the default settings allow each user to consume up to several megabytes of memory. It makes sens

[Devel] [PATCH RH9 19/23] ms/memcg: enable accounting for ldt_struct objects

2021-09-26 Thread Vasily Averin
Each task can request own LDT and force the kernel to allocate up to 64Kb memory per-mm. There are legitimate workloads with hundreds of processes and there can be hundreds of workloads running on large machines. The unaccounted memory can cause isolation issues between the workloads particularly

[Devel] [PATCH RH9 18/23] ms/memcg: enable accounting for posix_timers_cache slab

2021-09-26 Thread Vasily Averin
A program may create multiple interval timers using timer_create(). For each timer the kernel preallocates a "queued real-time signal", Consequently, the number of timers is limited by the RLIMIT_SIGPENDING resource limit. The allocated object is quite small, ~250 bytes, but even the default sign

[Devel] [PATCH RH9 20/23] ms/ipc: remove memcg accounting for sops objects in do_semtimedop()

2021-09-26 Thread Vasily Averin
Linus proposes to revert an accounting for sops objects in do_semtimedop() because it's really just a temporary buffer for a single semtimedop() system call. This object can consume up to 2 pages, syscall is sleeping one, size and duration can be controlled by user, and this allocation can be rep

[Devel] [PATCH RH9 21/23] memcg: Enable accounting for nft objects

2021-09-26 Thread Vasily Averin
Patch enables memcg accounting for nft objects. https://jira.sw.ru/browse/PSBM-128719 Signed-off-by: Vasily Averin (cherry picked from commit b6d838ad6519bac8fc003a3fdfe2ca8623492674) VvS: ported to rh9 https://jira.sw.ru/browse/PSBM-133990 Signed-off-by: Vasily Averin --- net/netfilter/nf_tabl

[Devel] [PATCH RH9 23/23] netfilter/x_tables: account entry offsets allocations

2021-09-26 Thread Vasily Averin
Entry offsets may consume a lot of kernel memory. So let's account them. https://jira.sw.ru/browse/PSBM-54407 Signed-off-by: Andrey Ryabinin (cherry picked from vz7 commit 56c0d7d5cc4b ("netfilter/x_tables: account entry offsets allocations")) VZ 8 rebase part https://jira.sw.ru/browse/PSBM-12

[Devel] [PATCH RH9 22/23] memcg: charge kmem allocations accounted to UBC in PCS6 to memcg

2021-09-26 Thread Vasily Averin
First patch description: ms/kmemcg: account certain kmem allocations to memcg Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_i

[Devel] [PATCH RH7 v2] mm: Do not drop __GFP_NOFAIL in __add_to_page_cache_locked

2021-09-26 Thread Evgenii Shatokhin
https://jira.sw.ru/browse/PSBM-98265 It is suspected that the memory allocations described in this bug fail because 'mapping_gfp_constraint(mapping, gfp_mask)' drops __GFP_NOFAIL from gfp_mask. Restore __GFP_NOFAIL bit for mem_cgroup_try_charge_cache() - this might help fix PSBM-98265. Changes in