On Thu, 7 Nov 2024 22:31:21 GMT, Sergey Chernyshev <schernys...@openjdk.org> wrote:
>> Cgroup V1 subsustem fails to initialize mounted controllers properly in >> certain cases, that may lead to controllers left undetected/inactive. We >> observed the behavior in CloudFoundry deployments, it affects also host >> systems. >> >> The relevant /proc/self/mountinfo line is >> >> >> 2207 2196 0:43 >> /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c >> /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - >> cgroup cgroup rw,cpu,cpuacct >> >> >> /proc/self/cgroup: >> >> >> 11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c >> >> >> Here, Java runs inside containerized process that is being moved cgroups due >> to load balancing. >> >> Let's examine the condition at line 64 here >> https://github.com/openjdk/jdk/blob/55a7cf14453b6cd1de91362927b2fa63cba400a1/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L59-L72 >> It is always FALSE and the branch is never taken. The issue was spotted >> earlier by @jerboaa in >> [JDK-8288019](https://bugs.openjdk.org/browse/JDK-8288019). >> >> The original logic was intended to find the common prefix of `_root`and >> `cgroup_path` and concatenate the remaining suffix to the `_mount_point` >> (lines 67-68). That could lead to the following results: >> >> Example input >> >> _root = "/a" >> cgroup_path = "/a/b" >> _mount_point = "/sys/fs/cgroup/cpu,cpuacct" >> >> >> result _path >> >> "/sys/fs/cgroup/cpu,cpuacct/b" >> >> >> Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page >> (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control >> groups states: >> >> >> ... >> /proc/pid/cgroup (since Linux 2.6.24) >> This file describes control groups to which the process >> with the corresponding PID belongs. The displayed >> information differs for cgroups version 1 and version 2 >> hierarchies. >> For each cgroup hierarchy of which the process is a >> member, there is one entry containing three colon- >> separated fields: >> >> hierarchy-ID:controller-list:cgroup-path >> >> For example: >> >> 5:cpuacct,cpu,cpuset:/daemons >> ... >> [3] This field contains the pathname of the control group >> in the hierarchy to which the process belongs. This >> pathname is relative to the mount point of the >> hierarchy. >> >> >> This explicitly states the "pathname is relative t... > > Sergey Chernyshev has updated the pull request with a new target base due to > a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains four additional > commits since the last revision: > > - Merge branch 'master' into JDK-8343191 > - patch reimplemented > - fix the logic that skips duplicate controller's mount points > - 8343191: Cgroup v1 subsystem fails to set subsystem path We really need to consider adding container tests for this use-case. Perhaps we can require root perms for it. The tricky part would be adding appropriate synchronization for the cgroup move of the shell process and subsequent `java` runs. src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 54: > 52: ss.print_raw(_mount_point); > 53: if (strcmp(_root, "/") == 0) { > 54: // host processes / containers w/private cgroup namespace Suggestion: // host processes and containers with cgroupns=private src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 62: > 60: // containers only, warn if doesn't match > 61: log_warning(os, container)("Cgroup v1 controller (%s) mounting root > [%s] doesn't match cgroup [%s]", > 62: _mount_point, _root, cgroup_path); Why this warning? It appears it would make more sense to produce this warning when `cgroup_path` contains `../` and falling back to the `mount_path` for the subsystem path which indicates we have a `cgroupns=private` deployment on CG v1, but would likely get away with it since `memory.limit_in_bytes` will be present at the mount root. If `cgroup_path` doesn't contain `../` we should append the `cgroup_path` to the `_mount_point` similar to what we do for cg v2. In the cloudflare case we'd end up with a subsystem path of `/sys/fs/cgroup/cpu,cpuacct/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c`. Since the `cgroup_path != _root` we trigger path adjustment increasing the chance to detect any lower limit in any of the paths down to the mount point. src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java line 46: > 44: } > 45: > 46: public void setPath(String cgroupPath) { This should behave the same as Hotspot and also append the cgroup path to the mount point. Then let [JDK-8336881](https://bugs.openjdk.org/browse/JDK-8336881) kick in to reduce it down to the mount point (if necessary). ------------- PR Review: https://git.openjdk.org/jdk/pull/21808#pullrequestreview-2430488512 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1838641975 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1838656361 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1838666453