On Mon, 7 Oct 2024 19:37:51 GMT, Sebastian Lövdahl <d...@openjdk.org> wrote:

>> The fix is twofold.
>> 
>> 1. Stop the main container after an iteration is done. The main container is 
>> started with its runtime defined as 120 seconds, which means that each 
>> iteration takes 120 seconds. In reality, one iteration takes a few seconds 
>> while 115 seconds is spent waiting on the main container exiting.
>> 
>> 2. Change the name of the main container to be unique per iteration. 
>> Containers are started with `--rm`, which means they are removed after 
>> exiting. However, the removal is done asynchronously _after_ the `stop` 
>> command has returned. This means that the second iteration may get an error 
>> if the same container name is used if the removal was not done before the 
>> container is started in the next iteration.
>> 
>> On my machine, this cuts down the test runtime using Podman from 4m 13s to 
>> 17s. Using Docker, the runtime goes from 4m 15s to 41s.
>> 
>> Podman only runs half the test cases (since JDK-8341310) which explain some 
>> of the difference. But there is also something strange going on in the 
>> Docker case; every `docker stop` call takes 10 seconds, and I have not been 
>> able to figure out what exactly causes it.
>> 
>> Doing a manual `kill [container Java process PID]` gracefully terminates the 
>> Java process and container, but `docker stop` never does. Instead, it blocks 
>> for 10 seconds before abruptly killing the process using `SIGKILL`. I 
>> confirmed this with a simplified case and both
>> `strace -e 'trace=!all' docker run --init eclipse-temurin:23 java ..` and 
>> `strace -e 'trace=!all' docker run eclipse-temurin:23 java ..`, no signals 
>> were ever visible when calling either `docker stop` or `docker kill`.
>> 
>> https://www.docker.com/blog/docker-best-practices-choosing-between-run-cmd-and-entrypoint/
>>  and "What is PID 1 and why does it matter?" talks about why 
>> [`--init`](https://docs.docker.com/reference/cli/docker/container/run/#init) 
>> is supposed to help.
>
> Sebastian Lövdahl has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Have EventGeneratorLoop end after a more predictable duration

test/hotspot/jtreg/containers/docker/EventGeneratorLoop.java line 58:

> 56:             ev.msg = "Hello";
> 57:             ev.count = count++;
> 58:             ev.commit();

@kevinjwalls what do you think? I chose `System.nanoTime()` instead of 
`System.currentTimeMillis()` to get an even more predictable duration. 

Before this change, `EventGeneratorLoop` was guaranteed to post the same number 
of events as the input argument. It no longer is, but AFAICT no current test 
uses the events it posts either.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21331#discussion_r1790777201

Reply via email to