On Thu, 3 Oct 2024 19:13:46 GMT, Sebastian Lövdahl <d...@openjdk.org> wrote:

> The fix is twofold.
> 
> 1. Stop the main container after an iteration is done. The main container is 
> started with its runtime defined as 120 seconds, which means that each 
> iteration takes 120 seconds. In reality, one iteration takes a few seconds 
> while 115 seconds is spent waiting on the main container exiting.
> 
> 2. Change the name of the main container to be unique per iteration. 
> Containers are started with `--rm`, which means they are removed after 
> exiting. However, the removal is done asynchronously _after_ the `stop` 
> command has returned. This means that the second iteration may get an error 
> if the same container name is used if the removal was not done before the 
> container is started in the next iteration.
> 
> On my machine, this cuts down the test runtime using Podman from 4m 13s to 
> 17s. Using Docker, the runtime goes from 4m 15s to 41s.
> 
> Podman only runs half the test cases (since JDK-8341310) which explain some 
> of the difference. But there is also something strange going on in the Docker 
> case; every `docker stop` call takes 10 seconds, and I have not been able to 
> figure out what exactly causes it.
> 
> Doing a manual `kill [container Java process PID]` gracefully terminates the 
> Java process and container, but `docker stop` never does. Instead, it blocks 
> for 10 seconds before abruptly killing the process using `SIGKILL`. I 
> confirmed this with a simplified case and both
> `strace -e 'trace=!all' docker run --init eclipse-temurin:23 java ..` and 
> `strace -e 'trace=!all' docker run eclipse-temurin:23 java ..`, no signals 
> were ever visible when calling either `docker stop` or `docker kill`.
> 
> https://www.docker.com/blog/docker-best-practices-choosing-between-run-cmd-and-entrypoint/
>  and "What is PID 1 and why does it matter?" talks about why 
> [`--init`](https://docs.docker.com/reference/cli/docker/container/run/#init) 
> is supposed to help.

Hi, I think this looks good.
Having main container name be unique is good.  Do you just add "elevated-" + 
elevated here, it would be great to add even a random ID in there as well, as 
we can have multiple tests running on the same machine at times...

The other problem ( https://bugs.openjdk.org/browse/JDK-8341518 ) seems to be 
EventGeneratorLoop terminating too early, we have some logs where it's gone 
after 10 or 12 seconds, not 120.  It should maybe check a wall-clock time to 
ensure it lives long enough.  I can do that separately unless you think it 
makes sense to include it here, I don't think it will conflict.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21331#issuecomment-2396725847

Reply via email to