[Devel] [PATCH RHEL9 COMMIT] cgroup: remove excess rcu_read_lock in cgroup marking

2021-10-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-4.vz9.10.1 --> commit 90c3b81039abfa45f1037f86739a183374086912 Author: Pavel Tikhomirov Date: Wed Oct 13 14:13:18 2021 +0300 cgroup: remove excess rcu_r

[Devel] [PATCH RHEL9 COMMIT] ve/cgroup: Add ve_owner field to cgroup

2021-10-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-4.vz9.10.1 --> commit 0aa8d5b66160fa389809e3561b4b05bf62364d4c Author: Valeriy Vdovin Date: Wed Oct 13 14:13:30 2021 +0300 ve/cgroup: Add ve_owner field

[Devel] [PATCH RHEL9 COMMIT] ve/cgroup: temporary ignore misc cgroup to let vzctl start container

2021-10-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-4.vz9.10.1 --> commit 7f3e1ef61fc2a6dc3514dd9b64f595cc11388165 Author: Pavel Tikhomirov Date: Wed Oct 13 14:13:32 2021 +0300 ve/cgroup: temporary ignore

[Devel] [PATCH RHEL9 COMMIT] ve: use rcu_dereference for ve_ns in ve_get_init_css

2021-10-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-4.vz9.10.1 --> commit fe5a9b228d84b824d82b7700cbbfd0322812ccbf Author: Kirill Tkhai Date: Wed Oct 13 14:13:32 2021 +0300 ve: use rcu_dereference for ve_

[Devel] [PATCH RHEL9 COMMIT] cgroup: split cgroup_get_ve_root1 into css and cgroup version

2021-10-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-4.vz9.10.1 --> commit 898abc31593362995e322d2ae021ca9abfc353be Author: Pavel Tikhomirov Date: Wed Oct 13 14:13:31 2021 +0300 cgroup: split cgroup_get_ve

[Devel] [PATCH RHEL9 COMMIT] ve/cgroup: Skip non-virtualized roots in cgroup_{, un}mark_ve_roots()

2021-10-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-4.vz9.10.1 --> commit 5697a105caf70bf9468025ca7192afebdbe80fbd Author: Valeriy Vdovin Date: Wed Oct 13 14:13:29 2021 +0300 ve/cgroup: Skip non-virtualiz

[Devel] [PATCH vz9 07/20] mm/page_alloc: add latency to the page_alloc tracepoint

2021-10-13 Thread Nikita Yushchenko
From: Andrey Ryabinin Add 'lat' field to the mm_page_alloc tracepoint. It shows allocation latency in microseconds (0.01 second). Signed-off-by: Andrey Ryabinin Reviewed-by: Cyrill Gorcunov mm/page_alloc: fix latency in tracepoint. Since transition from jiffies to sched_cloc

[Devel] [PATCH vz9 03/20] /proc//vz_latency: Show maximal allocation latency in the last 2min.

2021-10-13 Thread Nikita Yushchenko
From: Andrey Ryabinin Add to '/proc//vz_latency' column with maximal latency task have seen in the last 2 minutes. E.g.: cat /proc/1/vz_latency TypeTotal_latCalls Max (2min) allocatomic:0 2940

[Devel] [PATCH vz9 00/20] part22

2021-10-13 Thread Nikita Yushchenko
Andrey Ryabinin (6): ve/page_alloc, kstat: account allocation latencies per-task and per-thread /proc//vz_latency: Show maximal allocation latency in the last 2min. /proc//vz_latency: Add scheduling stats /proc/vz/latency: distinguish atomic allocations in irq from in task atomi

[Devel] [PATCH vz9 13/20] ve/netfilter: Send iptables/netfilter kernel error messages to Containers

2021-10-13 Thread Nikita Yushchenko
From: Stanislav Kinsburskiy Rebasing and splitting netfilters sybsystem (port 66-diff-ve-net-netfilter-combined). Part 1. https://jira.sw.ru/browse/PSBM-18322 Signed-off-by: Kirill Tkhai khorenko@: rebase to kernel-3.10.0-229.7.2.el7: * hunk for include/net/netfilter/xt_log.h has been dropp

[Devel] [PATCH vz9 20/20] net: export "net/*/neigh/*/*" sysctls for Container

2021-10-13 Thread Nikita Yushchenko
From: Vasily Averin Weave Kubernetes plugin requires tuning of /proc/sys/net/ipv4/neigh/weave/base_reachable_time in particular, so let's export neighbour sysctls as well. https://jira.sw.ru/browse/PSBM-92107 Signed-off-by: Konstantin Khorenko (cherry picked from vz7 commit 8499e3458f18 ("ne

[Devel] [PATCH vz9 08/20] fence-watchdog: Print alive messages

2021-10-13 Thread Nikita Yushchenko
From: Pavel Tikhomirov We have a situation when node worked for ~8 days and came to The Point where jiffies > fence_wdog_jiffies64 (what ever updated /sys/kernel/watchdog_timer stopped doing these) but node lived after The Point for 17 more hours, these means that nobody called fence_wdog_check_t

[Devel] [PATCH vz9 05/20] /proc/vz/latency: distinguish atomic allocations in irq from in task atomics.

2021-10-13 Thread Nikita Yushchenko
From: Andrey Ryabinin Add to /proc/vz/latency 'alocirq' allocation type which shows allocation latencies done in irq contexts. 'alocatomic' now shows atomic allocations in task contexts. Also add 'Per-CPU alloc irq' which shows per-cpu 'alocirq' numbers. Example of new output: Version: 2.6 L

[Devel] [PATCH vz9 02/20] ve/page_alloc, kstat: account allocation latencies per-task and per-thread

2021-10-13 Thread Nikita Yushchenko
From: Andrey Ryabinin Vstorage wants per-process allocation latencies: - total accumulated latency (total time spent inside the kernel allocator) - total alloc attempts (so that average latency can be calculated) This adds /proc//vz_latency file which outputs the numbers: Type

[Devel] [PATCH vz9 15/20] ve/netfilter: Implement pernet net->ct.max / virtualize "nf_conntrack_max" sysctl

2021-10-13 Thread Nikita Yushchenko
From: Konstantin Khorenko Rebasing and splitting netfilters sybsystem (port 66-diff-ve-net-netfilter-combined). Part 1. https://jira.sw.ru/browse/PSBM-18322 Signed-off-by: Kirill Tkhai (cherry picked from vz7 commit c34a99c00f9d ("ve/netfilter: Implement pernet net->ct.max / virtualize "nf_co

[Devel] [PATCH vz9 06/20] /proc/vz/latency: Show max latency in 2 min instead of 5sec.

2021-10-13 Thread Nikita Yushchenko
From: Andrey Ryabinin Historically the "Lat" column in /proc/vz/latency showed max latency in 5 seconds. Chnage it to max latency in the last 2 minutes, the same as in the /proc//vz_latency Signed-off-by: Andrey Ryabinin (cherry-picked from vz7 commit 0c5707cfcc84 ("/proc/vz/latency: Show max

[Devel] [PATCH vz9 09/20] ve/device_cgroup: Introduce "devices.extra_list" cgroup file

2021-10-13 Thread Nikita Yushchenko
From: Konstantin Khorenko Recent versions of containerd (as a part of k3s-1.19.5) started to apply strict rules when parsing the contents of 'devices.list' files located in the devices cgroup. Namely, the access token is allowed to contain only those values [rwm], that are described in https://ww

[Devel] [PATCH vz9 04/20] /proc//vz_latency: Add scheduling stats

2021-10-13 Thread Nikita Yushchenko
From: Andrey Ryabinin Add scheduling latencies to /proc//vz_latency. They are the same as alloc latencies - total cumulative latency, number of schedule events, and latency maximum in the last 2 minutes. The sysctl kernel.sched_schedstats must be enabled to see these stats. https://jira.sw.ru/b

[Devel] [PATCH vz9 12/20] xfs: Allow to mount XFS in non-init userns

2021-10-13 Thread Nikita Yushchenko
From: Konstantin Khorenko At the moment XFS is not marked as ready to be mounted inside non-init userns (see xfs_fs_type), while we previously decided to allow using XFS inside a CT: b86b9049ba50 ("xfs: allow to mount xfs fs inside a Container") https://jira.sw.ru/browse/PSBM-72401 so let's

[Devel] [PATCH vz9 16/20] ve/netfilter: Check for permittions while looking for target and match

2021-10-13 Thread Nikita Yushchenko
From: Kirill Tkhai Patchset description: Port autoloading of netfilter modules functuonality https://jira.sw.ru/browse/PSBM-28910 Signed-off-by: Kirill Tkhai Kirill Tkhai (4): kmod: Move check of VE permitions from __call_usermodehelper_exec() to upper functions kmod: Port autol

[Devel] [PATCH vz9 17/20] net: Primitives to enable conntrack allocation

2021-10-13 Thread Nikita Yushchenko
From: Stanislav Kinsburskiy Patchset description: Create conntrack structures only if they are really needed Allocate conntracks only after there is a rule which uses them. v2: Allow after there is a rule and never prohibit. khorenko@: the idea behind all of this: we want to provide the possi

[Devel] [PATCH vz9 01/20] core: Add glob_kstat, percpu kstat and account mm stat

2021-10-13 Thread Nikita Yushchenko
From: Kirill Tkhai Adds latency calculation for: kstat_glob.swap_in kstat_glob.page_in kstat_glob.alloc_lat And fail count in: kstat_glob.alloc_fails Also incorporates fixups patches: kstat: Make kstat_glob::swap_in percpu - core part ve/mm/kstat: Port diff-ve-kstat-disable-interrupt

[Devel] [PATCH vz9 11/20] net: export net/core/somaxconn sysctl for unprivileged users

2021-10-13 Thread Nikita Yushchenko
From: Jan Dakinevich Some unprivileged containers desire this sysctl. Apparently it is safe for userns root, and there is no reason to restrict them. https://jira.sw.ru/browse/PSBM-91032 Signed-off-by: Jan Dakinevich Reviewed-by: Vasily Averin (cherry picked from vz7 commit dccf5143b132 ("ne

[Devel] [PATCH vz9 10/20] ve/device_cgroup: Show all devices allowed in ct to fool docker

2021-10-13 Thread Nikita Yushchenko
From: Pavel Tikhomirov We've seen that docker 20+ not only writes "a *:* rwm" to privileged docker container device-cgroup (as pre-19 version did) but also checks the content after write, and docker expects that all devices are allowed for privileged docker container. In our VZCT we obviously ca

[Devel] [PATCH vz9 19/20] net: Mark conntrack users in nftables

2021-10-13 Thread Nikita Yushchenko
From: Kirill Tkhai Allow conntracks to be allocated in case of these rules are inserted. https://jira.sw.ru/browse/PSBM-51050 Signed-off-by: Kirill Tkhai Reviewed-by: Andrei Vagin vz8 rebase notes: = (cherry picked from vz7 commit 60931ce1ffcf ("net: Mark conntrack users in n

[Devel] [PATCH vz9 18/20] net: Mark conntrack users in xtables

2021-10-13 Thread Nikita Yushchenko
From: Kirill Tkhai Allow conntracks to be allocated in case of these rules are inserted. https://jira.sw.ru/browse/PSBM-51050 Signed-off-by: Kirill Tkhai Reviewed-by: Andrei Vagin +++ ve/net: Delete allow_conntrack_allocation() from nf_synproxy Since nf_conntrack_alloc() is not called ther

[Devel] [PATCH vz9 14/20] ve/netfilter: Implement pernet expect_max / virtualize "net.netfilter.nf_conntrack_expect_max" sysctl

2021-10-13 Thread Nikita Yushchenko
From: Konstantin Khorenko Rebasing and splitting netfilters sybsystem (port 66-diff-ve-net-netfilter-combined). Part 1. https://jira.sw.ru/browse/PSBM-18322 * diff-ve-nf-make-nf_ct_expect_max-sysctl-virtual Author: Pavel Emelyanov Subject: [PATCH rh6] ve: Make nf_ct_expect_max "virtualized" Dat

[Devel] [PATCH RH9 1/2] kernel/cgroup: implement cgroup_get_e_ve_css

2021-10-13 Thread Kirill Tkhai
From: Andrey Zhadchenko Existing cgroup_get_e_css() is not suited for cgroup-v1 and will always return root cgroup css. Implement new cgroup_get_e_ve_css to return ve css. https://jira.sw.ru/browse/PSBM-131253 Signed-off-by: Andrey Zhadchenko Reviewed-by: Kirill Tkhai ===

[Devel] [PATCH RH9 0/2] part23 part2

2021-10-13 Thread Kirill Tkhai
--- Andrey Zhadchenko (2): kernel/cgroup: implement cgroup_get_e_ve_css mm/backing-dev: associate writeback with correct blkcg include/linux/cgroup.h |2 ++ kernel/cgroup/cgroup.c | 19 +++ mm/backing-dev.c | 22 -- 3 files change

[Devel] [PATCH RH9 2/2] mm/backing-dev: associate writeback with correct blkcg

2021-10-13 Thread Kirill Tkhai
From: Andrey Zhadchenko Use cgroup_get_e_ve_css to get correct blkcg_css for writeback instances. https://jira.sw.ru/browse/PSBM-131253 Signed-off-by: Andrey Zhadchenko Reviewed-by: Kirill Tkhai v2: khorenko@: introduce a wrapper for getting blkcg_css from memcg_css. ===