On Thu, 12 Dec 2024 01:11:33 GMT, Sergey Chernyshev <schernys...@openjdk.org> 
wrote:

>> Cgroup V1 subsustem fails to initialize mounted controllers properly in 
>> certain cases, that may lead to controllers left undetected/inactive. We 
>> observed the behavior in CloudFoundry deployments, it affects also host 
>> systems.
>> 
>> The relevant /proc/self/mountinfo line is
>> 
>> 
>> 2207 2196 0:43 
>> /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c 
>> /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - 
>> cgroup cgroup rw,cpu,cpuacct
>> 
>> 
>> /proc/self/cgroup:
>> 
>> 
>> 11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
>> 
>> 
>> Here, Java runs inside containerized process that is being moved cgroups due 
>> to load balancing.
>> 
>> Let's examine the condition at line 64 here 
>> https://github.com/openjdk/jdk/blob/55a7cf14453b6cd1de91362927b2fa63cba400a1/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L59-L72
>> It is always FALSE and the branch is never taken. The issue was spotted 
>> earlier by @jerboaa in 
>> [JDK-8288019](https://bugs.openjdk.org/browse/JDK-8288019). 
>> 
>> The original logic was intended to find the common prefix of `_root`and 
>> `cgroup_path` and concatenate the remaining suffix to the `_mount_point` 
>> (lines 67-68). That could lead to the following results: 
>> 
>> Example input
>> 
>> _root = "/a"
>> cgroup_path = "/a/b"
>> _mount_point = "/sys/fs/cgroup/cpu,cpuacct"
>> 
>> 
>> result _path
>> 
>> "/sys/fs/cgroup/cpu,cpuacct/b"
>> 
>> 
>> Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page 
>> (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control 
>> groups states:
>> 
>> 
>> ...
>>        /proc/pid/cgroup (since Linux 2.6.24)
>>               This file describes control groups to which the process
>>               with the corresponding PID belongs.  The displayed
>>               information differs for cgroups version 1 and version 2
>>               hierarchies.
>>               For each cgroup hierarchy of which the process is a
>>               member, there is one entry containing three colon-
>>               separated fields:
>> 
>>                   hierarchy-ID:controller-list:cgroup-path
>> 
>>               For example:
>> 
>>                   5:cpuacct,cpu,cpuset:/daemons
>> ...
>>               [3]  This field contains the pathname of the control group
>>                    in the hierarchy to which the process belongs. This
>>                    pathname is relative to the mount point of the
>>                    hierarchy.
>> 
>> 
>> This explicitly states the "pathname is relative t...
>
> Sergey Chernyshev has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   added details in the log message

`CgroupV1Controller::set_subsystem_path` needs high level comment update to 
describe the logic happening.

Testing:
>>And after the patch this would become this, right?
>>```
>>/sys/fs/cgroup/cpu,cpuacct/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
>>/sys/fs/cgroup/cpu,cpuacct/
>>```
>
>It depends on whether it was a subgroup in the initial path. If 
>bad/2f57368b-0eda-4e52-64d8-af5c is the subgroup, the reduction will be
>
>```
>/sys/fs/cgroup/cpu,cpuacct/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
>/sys/fs/cgroup/cpu,cpuacct/bad/2f57368b-0eda-4e52-64d8-af5c
>/sys/fs/cgroup/cpu,cpuacct/bad
>/sys/fs/cgroup/cpu,cpuacct/
>```

The above case, doesn't seem to be reflected by any gtest test case (or 
others), please add those.

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 75:

> 73:               break;
> 74:             }
> 75:             suffix = strchr(suffix+1, '/');

Style:

Suggestion:

            suffix = strchr(suffix + 1, '/');

src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java
 line 63:

> 61:                         Path cgp = Path.of(cgroupPath);
> 62:                         int nameCount = cgp.getNameCount();
> 63:                         for (int i=0; i<nameCount; ++i) {

Style:

Suggestion:

                        for (int i=0; i < nameCount; i++) {

src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java
 line 73:

> 71:                                 break;
> 72:                             }
> 73:                             cgp = (cgp.getNameCount() > 1) ? 
> cgp.subpath(1, cgp.getNameCount()) : Path.of("");

Suggestion:

                            cgp = (nameCount > 1) ? cgp.subpath(1, nameCount) : 
Path.of("");

test/hotspot/jtreg/containers/docker/TestMemoryWithSubgroups.java line 60:

> 58: 
> 59:             Common.prepareWhiteBox();
> 60:             DockerTestUtils.buildJdkContainerImage(imageName);

This can be moved out of the condition. It's there in both cases.

test/hotspot/jtreg/containers/docker/TestMemoryWithSubgroups.java line 85:

> 83:         } else {
> 84:             System.out.println("Metrics are from neither cgroup v1 nor 
> v2, skipped for now.");
> 85:         }

Suggestion:

        throw new RuntimeException("cgroup version not known? was: " + 
metrics.getProvider());

test/hotspot/jtreg/containers/docker/TestMemoryWithSubgroups.java line 98:

> 96:         opts.addDockerOpts("--privileged")
> 97:             .addDockerOpts("--cgroupns=" + (privateNamespace ? "private" 
> : "host"))
> 98:             .addDockerOpts("--memory", "1g");

Please extract this "larger" memory limit out of this function. The expectation 
is to have:


Container memory limit => X
Some lower limit in a moved path => Y

Expect that the lower limit of Y is being detected.


For example:


testMemoryLimitSubgroupV1("1g", "100m", "104857600", false);

test/hotspot/jtreg/containers/docker/TestMemoryWithSubgroups.java line 118:

> 116:         opts.addDockerOpts("--privileged")
> 117:             .addDockerOpts("--cgroupns=" + (privateNamespace ? "private" 
> : "host"))
> 118:             .addDockerOpts("--memory", "1g");

Same here with the `1g` container memory limit. Move it to a parameter.

test/jdk/jdk/internal/platform/docker/TestDockerMemoryMetricsSubgroup.java line 
72:

> 70:         } else {
> 71:             System.out.println("Metrics are from neither cgroup v1 nor 
> v2, skipped for now.");
> 72:         }

Please `throw` here.

-------------

Changes requested by sgehwolf (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21808#pullrequestreview-2620718790
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958032435
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958043532
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958048577
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958073462
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958075672
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958059439
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958061783
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1958077572

Reply via email to