On Wed, 2 Oct 2024 16:50:05 GMT, Severin Gehwolf <sgehw...@openjdk.org> wrote:

>> The change of [JDK-8327114](https://bugs.openjdk.org/browse/JDK-8327114) 
>> also increased test coverage. In particular, the `TestJcmdWithSideCar.java` 
>> test got enhanced to cover these cases (prior to 
>> [JDK-8327114](https://bugs.openjdk.org/browse/JDK-8327114) only case 1 was 
>> tested):
>> 
>> 1. Shared volumes between attachee and attacher and shared pid namespace
>> 2. Shared volumes between attachee and attacher and shared pid namespace, 
>> both running with elevated privileges
>> 3. Shared pid namespace between attachee and attacher only
>> 4. Shared pid namespace between attachee and attacher, both running with 
>> elevated privileges
>> 
>> The OpenJDK attach code is able to handle cases 1 through 3 which pass, but 
>> the last case, `4`, hasn't been implemented yet when running as regular user 
>> and directing the container runtime to map the container user to that user 
>> as well. Thus, the test fails. For now I propose to disable the 4th test 
>> case. It can get re-enabled once the product code got updated to account for 
>> this case (tracked in https://bugs.openjdk.org/browse/JDK-8341349).
>> 
>> While at it, I've discovered that the `EventGeneratorLoop` running container 
>> always runs for a fixed time: `30` seconds. That's irrespective of the 
>> attacher containers being done. Therefore, two iterations of the loop 
>> spawning this container running the fixed set of time will at least run `60` 
>> seconds. In my test runs using `-summary:time` it was `70` seconds:
>> 
>> 
>> Passed: containers/docker/TestJcmdWithSideCar.java
>>   build: 2.08 seconds
>>   compile: 2.068 seconds
>>   build: 4.842 seconds
>>   compile: 4.841 seconds
>>   driver: 70.776 seconds
>> Test results: passed: 1
>> 
>> 
>> I don't think this is needed. We can destroy the process running 
>> `EventGeneratorLoop` and then wait for it to exit rather than sitting there 
>> and waiting until it is finished (even though we no longer do anything with 
>> it). This improvement has been implemented in 
>> https://github.com/openjdk/jdk/commit/5b2f646c73b747f6fff364347031074d24e49822
>>  (separate commit). After this, the total runtime of the test reduces to 
>> about `22` seconds:
>> 
>> 
>> Passed: containers/docker/TestJcmdWithSideCar.java
>>   build: 2.169 seconds
>>   compile: 2.157 seconds
>>   build: 4.964 seconds
>>   compile: 4.963 seconds
>>   driver: 22.928 seconds
>> Test results: passed: 1
>> 
>> 
>> The actual test skip of the 4th test case is: 
>> https://github.com/openjdk/jdk/commit/95a7cc05f00e94190af583b9f9dbc659c7be879f
>> 
>> Thoughts? Could somebody please run th...
>
> Severin Gehwolf has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove the attachee container if it exists

Right, this turned into a rabbit hole of its own.. 😁 

`make test 
TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" 
JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker"` is still failing:


Full child process STDOUT was saved to docker-stdout-139602.log
[2024-10-02T17:02:06.397293068Z] Waiting for completion for process 139602
[2024-10-02T17:02:06.397362093Z] Waiting for completion finished for process 
139602
[COMMAND]
docker ps 
[2024-10-02T17:02:07.771881444Z] Gathering output for process 139723
[ELAPSED: 19 ms]
[STDERR]

[STDOUT]
CONTAINER ID   IMAGE                                                            
  COMMAND                  CREATED         STATUS         PORTS     NAMES
1db392c08e7c   jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd 
  "/jdk/bin/java -XX:+
"   5 seconds ago   Up 5 seconds             
test-container-main

Full child process STDOUT was saved to docker-stdout-139723.log
[2024-10-02T17:02:07.790740367Z] Waiting for completion for process 139723
[2024-10-02T17:02:07.790915466Z] Waiting for completion finished for process 
139723
[COMMAND]
docker rm test-container-main 
[2024-10-02T17:02:07.792750591Z] Gathering output for process 139734
[ELAPSED: 15 ms]
[STDERR]
Error response from daemon: You cannot remove a running container 
1db392c08e7cbac3a256597c2ee693fdbbeab3a814cb229c83ac38c0c805f42d. Stop the 
container before attempting removal or force remove


I _think_ the underlying problem is that killing the process from the Java test 
is not properly propagated to the `EventGeneratorLoop` process running as a 
Docker container. Among others, I found 
https://www.kaggle.com/code/residentmario/best-practices-for-propagating-signals-on-docker
 talking about it. So, in the Docker case the container continues running for 
the whole 30 seconds even though we called `p.destroy()`. `docker rm` for the 
next test case will fail because the container is still running. So, I started 
out by trying this:


diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java 
test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
index 5132f14d788..4726284cff1 100644
--- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
+++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
@@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws 
Exception {
             OutputAnalyzer out = 
DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps")
                 .shouldHaveExitValue(0);
             if (out.contains(MAIN_CONTAINER_NAME)) {
-                DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", 
MAIN_CONTAINER_NAME)
+                DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", 
"--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME)
                    .shouldHaveExitValue(0);
             }
             // start "main" container (the observee)
             DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop");


..only to be met by:


[main-container-process] docker: Error response from daemon: Conflict. The 
container name "/test-container-main" is already in use by container 
"f2754e97e6eecd6b0538e8a8c3d7c12214d16f9f32a9e2fe313c1d9f3258732c". You have to 
remove (or rename) that container to be able to reuse that name.
[main-container-process] See 'docker run --help'.
java.lang.RuntimeException: Timed out while waiting for main() to start
        at 
TestJcmdWithSideCar$MainContainer.waitForMainMethodStart(TestJcmdWithSideCar.java:293)
        at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:110)
        at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:573)
        at 
com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
        at java.base/java.lang.Thread.run(Thread.java:1576)


Because we start containers with `--rm`, they seem to be asynchronously removed 
_after_ `stop` has finished. Finally, if we wait a little bit after the `stop` 
command, `make test 
TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" 
JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker"` works again.


diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java 
test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
index 5132f14d788..4726284cff1 100644
--- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
+++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
@@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws 
Exception {
             OutputAnalyzer out = 
DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps")
                 .shouldHaveExitValue(0);
             if (out.contains(MAIN_CONTAINER_NAME)) {
-                DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", 
MAIN_CONTAINER_NAME)
+                DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", 
"--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME)
                    .shouldHaveExitValue(0);
+                Thread.sleep(1_000L);
             }
             // start "main" container (the observee)
             DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop");

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21289#issuecomment-2389263714

Reply via email to