https://jira.sw.ru/browse/PSBM-133986
Kirill Tkhai (11):
sched: Add ve name to sched_show_task()
sched: disable dumping cfs info on sysrq trigger
kernel: Account nr_zombie and nr_dead
sched: Count rq::nr_sleeping and cfs_rq::nr_unint
sched: Account cfs_rq::nr_iowait
sched: Add primitiv
From: Kirill Tkhai
Fill mask using virtual cpus which are enumerated
from 0 to num_online_vcpus()-1.
Just in pair to /proc/cpuinfo and sched_setaffinity().
https://jira.sw.ru/browse/PSBM-25367
Signed-off-by: Kirill Tkhai
Acked-by: Vladimir Davydov
An addition by khorenko@:
"nproc" utility wo
From: Kirill Tkhai
Extracted from "Initial patch".
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
(cherry picked from commit b097bc8d02100cc6c8fa55433fc7a059b04c37fb)
Signed-off-by: Alexander Mikhalitsyn
---
include/linux/sched.h | 3 +++
kernel/exit.c | 14 +
From: Kirill Tkhai
Extracted from "Initial patch".
Note: it will be better to move nr_unint
to struct task_group in the future.
Signed-off-by: Kirill Tkhai
Rebase to RHEL8.3 kernel-4.18.0-240.1.1.el8_3 notes:
khorenko@:
i've substituted task_contributes_to_load() with
tsk->sched_contrib
From: Kirill Tkhai
Extracted from "ve: initial patch".
rcu_read_unlock() move by Andrey Ryabinin.
Signed-off-by: OpenVZ Team
Signed-off-by: Andrey Ryabinin
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
See also
cc172ff301 ("sched/debug: Fix the alignment of the show-sta
From: Kirill Tkhai
Add posibility to limit cpus used by cgroup/container.
Signed-off-by: Vladimir Davydov
Signed-off-by: Kirill Tkhai
+++
sched: Allow configuring sched_vcpu_hotslice and sched_cpulimit_scale_cpufreq
Let's make our sysctls ported from vz8 to be really configurable.
These are
From: Kirill Tkhai
Extracted from "Initial patch".
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
unsigned long -> unsigned int
(cherry picked from commit 153ef777fd0e3be36ed2d4cb28de0449f7e14c41)
Signed-off-by: Alexander Mikhalitsyn
---
include/linux/sched/stat.h | 2 +
From: Konstantin Khorenko
Signed-off-by: OpenVZ Team
Extracted by Konstantin Khorenko
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
(cherry picked from commit b055e02b4378443778b38ef77712b803f9bcb19f)
Signed-off-by: Alexander Mikhalitsyn
---
kernel/sched/core.c | 3 +++
From: Kirill Tkhai
Extracted from "Initial patch".
Signed-off-by: Kirill Tkhai
+++
sched: fix cfs_rq::nr_iowait accounting
After recent RedHat (b6be9ae "rh7: import RHEL7 kernel-3.10.0-957.12.2.el7")
following sequence:
update_stats_dequeue()
dequeue_sleeper()
cfs_rq->nr_iowait++
From: Kirill Tkhai
Add CONFIG_CPULIMIT cpu cgroup files.
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
See
f4183717b ("sched/fair: Introduce the burstable CFS controller")
(cherry picked from commit 0e0e0bfbf884f8fb0347e8cb6ed27aa2bf991c91)
Signed-off-by: Alexander Mikhal
From: Kirill Tkhai
This results in soft lockups, because it writes too much data to
console. At the same time information it shows is only useful for sched
debugging and can be obtained via /proc/sched_debug anyway. Besides, it
is disabled in PCS6. So disable it in vz7 either.
https://jira.sw.ru
From: Kirill Tkhai
This is needed for CONFIG_CFS_CPULIMIT.
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
(cherry picked from commit dcff66d4a02c9fc56536b3e002b4dee3e9efd3fa)
Signed-off-by: Alexander Mikhalitsyn
---
kernel/sched/core.c | 20 +++-
kernel/s
From: Kirill Tkhai
Refactoring needed for CONFIG_CFS_CPULIMIT.
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
(cherry picked from commit f511f958d4294fe50e1b42661a5bd7012b61f82a)
Signed-off-by: Alexander Mikhalitsyn
---
kernel/sched/fair.c | 15 ++-
1 file cha
ms commit b7d6c3033323 ("block: fix use-after-free on cached
last_lookup partition") has changed the semantic of
disk_map_sector_rcu() function: previously it only returned appropriate
partition but after the patch it additionally gets a refcounter on that
partition.
RedHat has backported that pat
https://jira.sw.ru/browse/PSBM-133986
Kirill Tkhai (1):
ve/net/ip_gre: containerize per-net devices
Konstantin Khorenko (4):
ve/net/cred: add ve_capable to check capabilities relative to the
current VE (v2)
ve/net/vxlan: enable support in a container
ve/uevent: Use own uevent_seqnum f
From: Vladimir Davydov
This works on PCS6, so we should allow it on Vz7 either.
https://jira.sw.ru/browse/PSBM-43410
Signed-off-by: Vladimir Davydov
https://jira.sw.ru/browse/PSBM-133986
(cherry picked from commit 23eff01c369131f7e4cab1c37de1c253266d5039)
Signed-off-by: Alexander Mikhalitsyn
From: Vasily Averin
This patch enables dummy network devices support in a container.
https://jira.sw.ru/browse/PSBM-43329
khorenko@: JFYI: up to now it's required by ltp test only
(containers.netns_sysfs testcase).
Signed-off-by: Vasily Averin
https://jira.sw.ru/browse/PSBM-133986
(cherry p
From: Pavel Tikhomirov
For device cgroup(whitelist based case) we hide all devices, which are
not in a whitelist from processes in these cgroup.
- Cut devices in /proc/partitions and /proc/devices if have no read or
write permission for them.
- Cut devices in sys_ustat() if have no read permi
From: Konstantin Khorenko
vxlan is safe in CT as:
1) Udp multicast socket to connect to outer word sits in creation net-
namespace, and these socket can get packets only forwarded/routed
in creation ns.
2) Vxlan device is owned by second netns(could be same as first) as
any other network device
From: Vladimir Davydov
ip_vti devices lack NETIF_F_VIRTUAL, so they can't be created inside a
container. Problem is a device of this kind is created on net ns init if
the module is loaded, as a result a container start fails with EPERM.
We could allow ip_vti inside container (as well as other ne
From: Stanislav Kinsburskiy
Currently uevents are sending broadcastly to all net-namespaces present
in the system which is leading to problem of C/R'ing systemd based
containers (netlink socket sees data from the node and we can't dump
until the data is read).
So let's send broadcast events to ne
From: Konstantin Khorenko
We want to allow a few operations in VE. Currently we use nsown_capable,
but it's wrong, because in this case we allow these operations in any
user namespace.
v2: take ve0->cred if the currect ve isn't running
https://jira.sw.ru/browse/PSBM-39077
Signed-off-by: Andrew
From: Konstantin Khorenko
CT owner (and Virtuozzo tools like vzctl, prlctl) should be
able to manage 2nd level quota - quota inside the CT =>
allow management in top CT user namespace.
https://jira.sw.ru/browse/PSBM-40281
Signed-off-by: Konstantin Khorenko
https://jira.sw.ru/browse/PSBM-1339
From: Kirill Tkhai
Port patch diff-ve-net-ip_gre-containerize-per-net-devices from 2.6.32:
This patch adds IP GRE devices support in a container.
Done in the scope of
https://jira.sw.ru/browse/PSBM-24331
Signed-off-by: Kirill Tkhai
Signed-off-by: Stanislav Kinsbursky
https://jira.sw.ru/brow
From: Konstantin Khorenko
https://jira.sw.ru/browse/PSBM-17903
Signed-off-by: Kirill Tkhai
https://jira.sw.ru/browse/PSBM-133986
(cherry picked from commit 739915b241c792831b89f87bb6260da8a6a515e7)
Signed-off-by: Alexander Mikhalitsyn
---
include/linux/ve.h | 6 ++
kernel/ksysfs.c
From: Stanislav Kinsburskiy
Signed-off-by: Konstantin Khlebnikov
+++
ve/netlink: allow messages with family PF_BRIDGE type RTM_xxxNEIGH in CT
While reproducing the problem mentioned in patch 1 I found that
we need it to be able to configure vxlan fdb (Forwarding Database entry).
https://jira.
The commit is pushed to "branch-rh7-3.10.0-1160.41.1.vz7.183.x-ovz" and will
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.41.1.vz7.183.4
-->
commit 2b8d9cf32c04e2bf3eab299234b76773f424db9e
Author: Konstantin Khorenko
Date: Thu Sep 23 18:15:28 2021 +0300
p
The commit is pushed to "branch-rh8-4.18.0-305.3.1.vz8.7.x-ovz" and will appear
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-305.3.1.vz8.7.13
-->
commit ded8b4372053eb4abb997daf86dc3ac39f31465d
Author: Vasily Averin
Date: Thu Sep 23 19:26:48 2021 +0300
ms/memcg: proh
The commit is pushed to "branch-rh8-4.18.0-305.3.1.vz8.7.x-ovz" and will appear
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-305.3.1.vz8.7.13
-->
commit 4b7647f44f3097a068b7693dc2d3b06b7943c644
Author: Kirill Tkhai
Date: Thu Sep 23 19:31:14 2021 +0300
dm: Add dm-trac
The commit is pushed to "branch-rh8-4.18.0-305.3.1.vz8.7.x-ovz" and will appear
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-305.3.1.vz8.7.13
-->
commit b1e57da042a504286f0930db48b7441d39b63dc7
Author: Konstantin Khorenko
Date: Thu Sep 23 19:47:42 2021 +0300
configs:
The commit is pushed to "branch-rh8-4.18.0-305.3.1.vz8.7.x-ovz" and will appear
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-305.3.1.vz8.7.13
-->
commit 20578202eb6dc164d8118cc25c3703331250baa2
Author: Kirill Tkhai
Date: Thu Sep 23 19:54:39 2021 +0300
push_backup: Re
From: Andrey Ryabinin
dmesg |grep Killed
Before:
Killed process 14892 (trinity-c271) total-vm:97920kB, anon-rss:2508kB,
file-rss:1060kB
After:
Killed process 14892 (trinity-c271) in ve 4 total-vm:97920kB,
anon-rss:2508kB, file-rss:1060kB
https://jira.sw.ru/browse/PSBM-40610
Si
From: Vladimir Davydov
An mm_struct may be pinned by a file. An example is vhost-net device
created by a qemu/kvm (see vhost_net_ioctl -> vhost_net_set_owner ->
vhost_dev_set_owner). If such process gets OOM-killed, the reference to
its mm_struct will only be released from exit_task_work -> f
From: Vladimir Davydov
It will only spam dmesg on oom inside a container, so disable it by
default as it used to be in PCS6.
Signed-off-by: Vladimir Davydov
Reviewed-by: Kirill Tkhai
(cherry picked from vz8 commit 34a56ac2c3bd06751e83f66d9ac2e9f13cc2515a)
Signed-off-by: Andrey Zhadchenko
---
From: Josh Boyer
When CPUMASK_OFFSTACK was added in 2008, it was dependent upon
DEBUG_PER_CPU_MAPS being enabled, or an architecture could select it.
The debug dependency adds additional overhead that isn't required for
operation of the feature, and we need CPUMASK_OFFSTACK to increase the
NR_CPU
From: Kirill Tkhai
It's the similar to VZCTL_VE_CONFIGURE ioctl in PCS6.
Note: max_write_len is __NEW_UTS_LEN + 1, because I want to allow
echo ... > ve.os_release, which adds trailing '\n' to the string
(see man echo for details).
Extra symbol will be cut in ve_os_release_write().
https://ji
From: Andrey Ryabinin
So, the ptrace() hangs if we try to attach to stopped task
from freezing cgroup:
Tracee: Tracer:
static bool do_signal_stop(int signr)
__set_current_state(TASK_STOPPED);
freezable_schedule();
freezer_do_
From: Kirill Tkhai
FIXME: This patch must be rewritten via sched_entity::statistics::wait_max.
Signed-off-by: Vladimir Davydov
Signed-off-by: Kirill Tkhai
(cherry picked from vz8 commit 85d40e62bfe81cbaba338b7ecb5fc1377a3f2d7b)
Signed-off-by: Andrey Zhadchenko
---
kernel/ve/vzstat.c | 65 ++
From: Kirill Tkhai
Signed-off-by: Vladimir Davydov
Signed-off-by: Kirill Tkhai
+++
ve/vzstat.h: move some kstat definitions into new header
Move some definitions into kstat.h, so we could use later
in other headers (sched.h)
https://jira.sw.ru/browse/PSBM-81395
Signed-off-by: Andrey Ryabinin
From: Vladimir Davydov
In fact this patch was in 084.9 already, but was committed in kernel-ve,
so this is a move from kernel-ve to kernel-patches.
From: Konstantin Khlebnikov
Date: Thu, 12 Sep 2013 15:25:31 +0400
Subject: [vzlin-dev] [PATCH rh6] ve: allow taskstats netlink in netns
This patch
From: Kirill Tkhai
This will be used by vzstat.
Signed-off-by: Kirill Tkhai
(cherry picked from vz8 commit 7f795f34eebf25d87c638a27ba0cb68d307f72de)
Signed-off-by: Andrey Zhadchenko
---
include/linux/swap.h | 9 +
mm/swap_state.c | 8 ++--
2 files changed, 11 insertions(+),
From: Dmitry Safonov
The host admin may be confused by warning in dmesg with only
"comm", which may be anything a user in a container chooses.
Add ve name to this warning.
https://jira.sw.ru/browse/PSBM-49818
Signed-off-by: Dmitry Safonov
Signed-off-by: Stanislav Kinsburskiy
(cherry picked f
046f7c4bcc5a ("ve/mm: print OOM info to VE log") will be sent
later as it depends on part 3
Andrew Vagin (1):
ve/fs: add ve_capable to check capabilities relative to the current VE
Andrey Ryabinin (3):
ve/mm,oom: print information about ve of killed task
proc,memcg: use memcg limits for sh
From: Cyrill Gorcunov
To create fanotify objects one have to be sysadmin of a container.
The main potential problem is unlimited number of marks and queue,
but since it uses kmem cgroup to obtain objects this should be
controllable via memory cgroup settings.
https://jira.sw.ru/browse/PSBM-41409
From: Kirill Tkhai
Needed for our modules (vzstat).
FIXME: maybe reduce to EXPORT_SYMBOL_GPL().
Signed-off-by: Kirill Tkhai
(cherry picked from vz8 commit 88e9f6757a92d2687aae51cc63ab8d6c4e1d60cf)
Signed-off-by: Andrey Zhadchenko
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
diff
From: Stanislav Kinsburskiy
Directories should be created via proc_net_mkdir(),
files via proc_net_create() or proc_net_create_data() to become visible in a
container.
Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Stanislav Kinsburskiy
khorenko@: TODO: helpers are to be renamed.
Rebase
From: Evgenii Shatokhin
Allowing the privileged processes in the containers to set leases on
arbitrary files seems to make no harm. Let us make CAP_LEASE work there.
https://jira.sw.ru/browse/PSBM-46199
Signed-off-by: Evgenii Shatokhin
Acked-by: Cyrill Gorcunov
(cherry picked from vz8 commit
From: Stanislav Kinsburskiy
Signed-off-by: Stanislav Kinsburskiy
Rebased to vz9:
- ve_capable is moved to vfs_mknod becase due to ms
commit a3c751a50fe6 ("vfs: allow unprivileged whiteout creation")
(cherry picked from commit vz8 ea5765973b0087b555d608622b4ad6a676395b23)
Signed-off-by: Andrey
From: Stanislav Kinsburskiy
Disable non-virtualized file systems in containers.
This patch contains of two logical parts:
1) Filter out non-containerized filesystems output for
"/proc/filesystems".
2) Forbid access to fs structure if current VE is not super and filesystem
is not containerized.
From: Pavel Tikhomirov
Docker swarm tries to setup IPVS via netlink, so let him do it.
It can be done through setsockopt as each IPVS_CMD_* had it's
IP_VS_SO_SET_* analogy.
https://jira.sw.ru/browse/PSBM-63883
Signed-off-by: Pavel Tikhomirov
Reviewed-by: Andrew Vagin
(cherry picked from vz8 c
From: Kirill Tkhai
Signed-off-by: Vladimir Davydov
Signed-off-by: Kirill Tkhai
+++
vzstat: account cpu total time properly in mm performance stats
/proc/vz/mmperf occasionally accounts/shows wall total time in both
"Wall_tot_time" and "CPU_tot_time" columns, fix this.
mFixes: c0a20dd32be6 ("
From: Stanislav Kinsburskiy
This is needed to make iotop work in container:
https://jira.sw.ru/browse/PSBM-56171
Signed-off-by: Stanislav Kinsburskiy
(cherry picked from vz8 commit 288ad5c98b3bb97ba30c427989902f4173748441)
Signed-off-by: Andrey Zhadchenko
---
kernel/taskstats.c | 2 +-
1 fi
From: Vladimir Davydov
Author: Konstantin Khorenko
Email: khore...@parallels.com
Subject: lockdep: taint kernel on circular locking complains
Date: Mon, 23 Sep 2013 20:04:29 +0400
Currently our QA infrastructure cannot efficiently grep kernel logs
and analyze them thus lockdep complains are lost
From: Stanislav Kinsburskiy
This patch is a part of vz7 commit 4e8e69eb16b1 ("fs/ve: add new
FS_VE_MOUNT flag to allow mount in container init userns")
Some filesystems are allowed to be mounted only in init userns in
mainstream/rh kernel. And some of those we still would like to mount in
Contai
From: Vladimir Davydov
In the berserker mode we kill a bunch of tasks that are as bad as the
selected victim. We assume two tasks to be equally bad if they consume
the same permille of memory. With such a strict check, it might turn out
that oom berserker won't kill any tasks in case a fork bomb
From: Stanislav Kinsburskiy
Will be used to make per-net sysfs files visible in CT.
Signed-off-by: Stanislav Kinsburskiy
Rebase to RHEL8 beta kernel notes:
more helpers added: proc_net_{seq*,single*,net*}
TODO: rename helpers - proc_net_create_net() does not look good.
Signed-off-by: Konsta
From: Kirill Tkhai
This is need for vzstat.
Signed-off-by: Kirill Tkhai
(cherry picked from vz8 commit a8d3db1b7cd6f785fcc23255e5243fdfd905fcb3)
Signed-off-by: Andrey Zhadchenko
---
mm/mmzone.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/mmzone.c b/mm/mmzone.c
index eb89d6e..9c4
From: Vladimir Davydov
Feature: oom: berserker mode
The logic behind the OOM berserker is the same as in PCS6: if processes
are killed by oom killer too often (< sysctl vm.oom_relaxation, 1 sec by
default), we increase "rage" (min -10, max 20) and kill 1 << "rage"
youngest worst processes if "ra
From: Stanislav Kinsburskiy
We want to allow a few operations in VE. Currently we use nsown_capable,
but it's wrong, because in this case we allow these operations in any
user namespace.
https://jira.sw.ru/browse/PSBM-39077
Signed-off-by: Andrew Vagin
Signed-off-by: Stanislav Kinsburskiy
(ch
From: Kirill Tkhai
FIXME: This patch must be rewritten via sched_entity::statistics::wait_max.
Signed-off-by: Vladimir Davydov
Signed-off-by: Kirill Tkhai
(cherry picked from vz8 commit c0fb61430afceb6564a1b0792d51835b70eab9e1)
Signed-off-by: Andrey Zhadchenko
---
include/linux/ve.h | 3 +
From: Vladimir Davydov
It is possible to disable oom killer inside a memory cgroup by writing 1
to memory.oom_control. If a process inside such a cgroup hits the memory
limit and is unable to reclaim anything, it will wait until more memory
becomes available.
This operation shouldn't be allowed
From: Andrew Vagin
We want to allow a few operations in VE. Currently we use nsown_capable,
but it's wrong, because in this case we allow these operations in any
user namespace.
https://jira.sw.ru/browse/PSBM-39077
Signed-off-by: Andrew Vagin
Signed-off-by: Stanislav Kinsburskiy
khorenko@:
r
From: Andrey Ryabinin
Use memcg's limits of task to show /proc//oom_score.
Note: in vz7 we had different behavior. It showed 'oom_score'
based on 've->memcg' limits of process reading oom_score.
Now we look at memcg of process and don't care about the
current one. It seems more correct behavio
From: Kirill Tkhai
Extracted from "Initial patch".
Signed-off-by: Kirill Tkhai
Signed-off-by: Stanislav Kinsburskiy
+++
ve/fs/autofs: Allow autofs to be used inside a container
It turned out that autofs is used at least for NFS/CIFS and binfmt_misc.
Let's use new FS_VE_MOUNT flag to only al
From: Vladimir Davydov
Feature: mm: OOM guarantee
This patch description:
OOM guarantee works exactly like low limit, but for OOM, i.e. tasks
inside cgroups above the limit are killed first.
Read/write via memory.oom_guarantee.
Signed-off-by: Vladimir Davydov
Signed-off-by: Andrey Ryabinin
65 matches
Mail list logo