On Thu, 19 Sep 2024 14:03:45 GMT, Kevin Rushforth <k...@openjdk.org> wrote:
> This PR fixes a hang on exit which can happen if QuantumToolkit.dispose hangs > when called from the QuantumToolkit shutdown hook. A shutdown hook should > never run indefinitely, so the fix is to call dispose from a background > thread and wait for up to 5 seconds for it to finish normally, and then throw > an Error so that the application will be able to exit anyway. > > The QuantumToolkit registers a "Glass/Prism Shutdown Hook" that calls the > QuantumToolkit.dispose method to gracefully shutdown the toolkit and > renderer. If dispose hangs or deadlocks, this will prevent the application > from exiting, requiring the process to be killed. This has been observed on > macOS, due to [JDK-8238505](https://bugs.openjdk.org/browse/JDK-8238505), > which is is an intermittent deadlock that occurs during shutdown. We ought to > eventually fix that intermittent deadlock, but even in the presence of a > deadlock or hang, we don't want the shutdown hook to hang. > > This can happen at any time, but is especially a problem during an automated > test run, which will then hang until the Jenkins job is either killed or its > global timer (currently set to 12 hours) kills the job. Even in that case, > processes can be left lying around which can affect subsequent runs. > > I discovered this while testing > [JDK-8328629](https://bugs.openjdk.org/browse/JDK-8328629), which adds a > default Unit 5 timeout, and without fixing the deadlock, the timeout will be > unable to abort the hanging test. > > ### Testing instructions > > I tested this in connection with PR #1575 / > [JDK-8328629](https://bugs.openjdk.org/browse/JDK-8328629) using my > [test-timeout-and-shutdown-hang](https://github.com/kevinrushforth/jfx/tree/test-timeout-and-shutdown-hang) > branch. That branch has the fixes from both this PR and PR #1575 as well as > tests that will hang indefinitely without the fixes and fail due to the > expected timeout with the fixes. modules/javafx.graphics/src/main/java/com/sun/javafx/tk/quantum/QuantumToolkit.java line 263: > 261: @Override public void run() { > 262: // Run dispose in a background thread and wait for up to > 263: // 5 seconds for it to finish. If it doesn't, then throw > an is 5 seconds a good timeout value? could it be shorter? is it possible for the normal shutdown on a real system take more than 5 seconds? modules/javafx.graphics/src/main/java/com/sun/javafx/tk/quantum/QuantumToolkit.java line 276: > 274: try { > 275: if (!disposeLatch.await(5, TimeUnit.SECONDS)) { > 276: throw new InternalError("dispose timed out"); could this happen on a real system? if so, should the message be more clear and user-friendly? also, if it can be encountered in the field, it might be better to declare a constant and add a javadoc to it explaining what it is. what do you think? ------------- PR Review Comment: https://git.openjdk.org/jfx/pull/1574#discussion_r1767694260 PR Review Comment: https://git.openjdk.org/jfx/pull/1574#discussion_r1767695208