On Wed, 2 Oct 2024 16:50:05 GMT, Severin Gehwolf <sgehw...@openjdk.org> wrote:
>> The change of [JDK-8327114](https://bugs.openjdk.org/browse/JDK-8327114) >> also increased test coverage. In particular, the `TestJcmdWithSideCar.java` >> test got enhanced to cover these cases (prior to >> [JDK-8327114](https://bugs.openjdk.org/browse/JDK-8327114) only case 1 was >> tested): >> >> 1. Shared volumes between attachee and attacher and shared pid namespace >> 2. Shared volumes between attachee and attacher and shared pid namespace, >> both running with elevated privileges >> 3. Shared pid namespace between attachee and attacher only >> 4. Shared pid namespace between attachee and attacher, both running with >> elevated privileges >> >> The OpenJDK attach code is able to handle cases 1 through 3 which pass, but >> the last case, `4`, hasn't been implemented yet when running as regular user >> and directing the container runtime to map the container user to that user >> as well. Thus, the test fails. For now I propose to disable the 4th test >> case. It can get re-enabled once the product code got updated to account for >> this case (tracked in https://bugs.openjdk.org/browse/JDK-8341349). >> >> While at it, I've discovered that the `EventGeneratorLoop` running container >> always runs for a fixed time: `30` seconds. That's irrespective of the >> attacher containers being done. Therefore, two iterations of the loop >> spawning this container running the fixed set of time will at least run `60` >> seconds. In my test runs using `-summary:time` it was `70` seconds: >> >> >> Passed: containers/docker/TestJcmdWithSideCar.java >> build: 2.08 seconds >> compile: 2.068 seconds >> build: 4.842 seconds >> compile: 4.841 seconds >> driver: 70.776 seconds >> Test results: passed: 1 >> >> >> I don't think this is needed. We can destroy the process running >> `EventGeneratorLoop` and then wait for it to exit rather than sitting there >> and waiting until it is finished (even though we no longer do anything with >> it). This improvement has been implemented in >> https://github.com/openjdk/jdk/commit/5b2f646c73b747f6fff364347031074d24e49822 >> (separate commit). After this, the total runtime of the test reduces to >> about `22` seconds: >> >> >> Passed: containers/docker/TestJcmdWithSideCar.java >> build: 2.169 seconds >> compile: 2.157 seconds >> build: 4.964 seconds >> compile: 4.963 seconds >> driver: 22.928 seconds >> Test results: passed: 1 >> >> >> The actual test skip of the 4th test case is: >> https://github.com/openjdk/jdk/commit/95a7cc05f00e94190af583b9f9dbc659c7be879f >> >> Thoughts? Could somebody please run th... > > Severin Gehwolf has updated the pull request incrementally with one > additional commit since the last revision: > > Remove the attachee container if it exists Right, this turned into a rabbit hole of its own.. đ `make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker"` is still failing: Full child process STDOUT was saved to docker-stdout-139602.log [2024-10-02T17:02:06.397293068Z] Waiting for completion for process 139602 [2024-10-02T17:02:06.397362093Z] Waiting for completion finished for process 139602 [COMMAND] docker ps [2024-10-02T17:02:07.771881444Z] Gathering output for process 139723 [ELAPSED: 19 ms] [STDERR] [STDOUT] CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1db392c08e7c jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd "/jdk/bin/java -XX:+âŠ" 5 seconds ago Up 5 seconds test-container-main Full child process STDOUT was saved to docker-stdout-139723.log [2024-10-02T17:02:07.790740367Z] Waiting for completion for process 139723 [2024-10-02T17:02:07.790915466Z] Waiting for completion finished for process 139723 [COMMAND] docker rm test-container-main [2024-10-02T17:02:07.792750591Z] Gathering output for process 139734 [ELAPSED: 15 ms] [STDERR] Error response from daemon: You cannot remove a running container 1db392c08e7cbac3a256597c2ee693fdbbeab3a814cb229c83ac38c0c805f42d. Stop the container before attempting removal or force remove I _think_ the underlying problem is that killing the process from the Java test is not properly propagated to the `EventGeneratorLoop` process running as a Docker container. Among others, I found https://www.kaggle.com/code/residentmario/best-practices-for-propagating-signals-on-docker talking about it. So, in the Docker case the container continues running for the whole 30 seconds even though we called `p.destroy()`. `docker rm` for the next test case will fail because the container is still running. So, I started out by trying this: diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java index 5132f14d788..4726284cff1 100644 --- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java +++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java @@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws Exception { OutputAnalyzer out = DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps") .shouldHaveExitValue(0); if (out.contains(MAIN_CONTAINER_NAME)) { - DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", MAIN_CONTAINER_NAME) + DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", "--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME) .shouldHaveExitValue(0); } // start "main" container (the observee) DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); ..only to be met by: [main-container-process] docker: Error response from daemon: Conflict. The container name "/test-container-main" is already in use by container "f2754e97e6eecd6b0538e8a8c3d7c12214d16f9f32a9e2fe313c1d9f3258732c". You have to remove (or rename) that container to be able to reuse that name. [main-container-process] See 'docker run --help'. java.lang.RuntimeException: Timed out while waiting for main() to start at TestJcmdWithSideCar$MainContainer.waitForMainMethodStart(TestJcmdWithSideCar.java:293) at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:110) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:573) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1576) Because we start containers with `--rm`, they seem to be asynchronously removed _after_ `stop` has finished. Finally, if we wait a little bit after the `stop` command, `make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker"` works again. diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java index 5132f14d788..4726284cff1 100644 --- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java +++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java @@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws Exception { OutputAnalyzer out = DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps") .shouldHaveExitValue(0); if (out.contains(MAIN_CONTAINER_NAME)) { - DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", MAIN_CONTAINER_NAME) + DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", "--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME) .shouldHaveExitValue(0); + Thread.sleep(1_000L); } // start "main" container (the observee) DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21289#issuecomment-2389263714