Cgroup V1 subsustem fails to initialize mounted controllers properly in certain 
cases, that may lead to controllers left undetected/inactive. We observed the 
behavior in CloudFoundry deployments, it affects also host systems.

The relevant /proc/self/mountinfo line is


2207 2196 0:43 
/system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c 
/sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup 
cgroup rw,cpu,cpuacct


/proc/self/cgroup:


11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c


Here, Java runs inside containerized process that is being moved cgroups due to 
load balancing.

Let's examine the condition at line 64 here 
https://github.com/openjdk/jdk/blob/55a7cf14453b6cd1de91362927b2fa63cba400a1/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L59-L72
It is always FALSE and the branch is never taken. The issue was spotted earlier 
by @jerboaa in [JDK-8288019](https://bugs.openjdk.org/browse/JDK-8288019). 

The original logic was intended to find the common prefix of `_root`and 
`cgroup_path` and concatenate the remaining suffix to the `_mount_point` (lines 
67-68). That could lead to the following results: 

Example input

_root = "/a"
cgroup_path = "/a/b"
_mount_point = "/sys/fs/cgroup/cpu,cpuacct"


result _path

"/sys/fs/cgroup/cpu,cpuacct/b"


Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page 
(https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control groups 
states:


...
       /proc/pid/cgroup (since Linux 2.6.24)
              This file describes control groups to which the process
              with the corresponding PID belongs.  The displayed
              information differs for cgroups version 1 and version 2
              hierarchies.
              For each cgroup hierarchy of which the process is a
              member, there is one entry containing three colon-
              separated fields:

                  hierarchy-ID:controller-list:cgroup-path

              For example:

                  5:cpuacct,cpu,cpuset:/daemons
...
              [3]  This field contains the pathname of the control group
                   in the hierarchy to which the process belongs. This
                   pathname is relative to the mount point of the
                   hierarchy.


This explicitly states the "pathname is relative to the mount point of the 
hierarchy". Hence, the correct result could have been


/sys/fs/cgroup/cpu,cpuacct/a/b


However, if Java runs in a container, `/proc/self/cgroup` and 
`/proc/self/mountinfo` are mapped (read-only) from host, because docker uses 
`--cgroupns=host` by default in cgroup v1 hosts. Then `_root` and `cgroup_path` 
belong to the host and do not exist in the container. In containers Java must 
fall back to `_mount_point` of the corresponding cgroup controller.

When `--cgroupns=private` is used, `_root` and `cgroup_path` are always equal 
to `/`.

In hosts, the `cgroup_path` should always be added to the mount point, no 
matter how it compares to the `_root`.

The patch uses the result of `is_containerized()` to select the correct path. 
It is suggested to change the semantics of `is_read_only()` so that it returns 
the combined read-only flag for all mounted controllers. Currently the only 
usage of `_read_only` flag is to determine that V1 subsystem 
`is_containerized()`. `_read_only` flags are available in advance, before 
initialization of any CgroupV1SubsystemController objects.

The Java side is updated to follow the same logic.

-------------

Commit messages:
 - 8343191: Cgroup v1 subsystem fails to set subsystem path

Changes: https://git.openjdk.org/jdk/pull/21808/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21808&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8343191
  Stats: 229 lines in 10 files changed: 157 ins; 30 del; 42 mod
  Patch: https://git.openjdk.org/jdk/pull/21808.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21808/head:pull/21808

PR: https://git.openjdk.org/jdk/pull/21808

Reply via email to