On Thu, 19 Sep 2024 14:03:45 GMT, Kevin Rushforth <k...@openjdk.org> wrote:

> This PR fixes a hang on exit which can happen if QuantumToolkit.dispose hangs 
> when called from the QuantumToolkit shutdown hook. A shutdown hook should 
> never run indefinitely, so the fix is to call dispose from a background 
> thread and wait for up to 5 seconds for it to finish normally, and then throw 
> an Error so that the application will be able to exit anyway.
> 
> The QuantumToolkit registers a "Glass/Prism Shutdown Hook" that calls the 
> QuantumToolkit.dispose method to gracefully shutdown the toolkit and 
> renderer. If dispose hangs or deadlocks, this will prevent the application 
> from exiting, requiring the process to be killed. This has been observed on 
> macOS, due to [JDK-8238505](https://bugs.openjdk.org/browse/JDK-8238505), 
> which is is an intermittent deadlock that occurs during shutdown. We ought to 
> eventually fix that intermittent deadlock, but even in the presence of a 
> deadlock or hang, we don't want the shutdown hook to hang.
> 
> This can happen at any time, but is especially a problem during an automated 
> test run, which will then hang until the Jenkins job is either killed or its 
> global timer (currently set to 12 hours) kills the job. Even in that case, 
> processes can be left lying around which can affect subsequent runs.
> 
> I discovered this while testing 
> [JDK-8328629](https://bugs.openjdk.org/browse/JDK-8328629), which adds a 
> default Unit 5 timeout, and without fixing the deadlock, the timeout will be 
> unable to abort the hanging test.
> 
> ### Testing instructions
> 
> I tested this in connection with PR #1575 / 
> [JDK-8328629](https://bugs.openjdk.org/browse/JDK-8328629) using my 
> [test-timeout-and-shutdown-hang](https://github.com/kevinrushforth/jfx/tree/test-timeout-and-shutdown-hang)
>  branch. That branch has the fixes from both this PR and PR #1575 as well as 
> tests that will hang indefinitely without the fixes and fail due to the 
> expected timeout with the fixes.

modules/javafx.graphics/src/main/java/com/sun/javafx/tk/quantum/QuantumToolkit.java
 line 263:

> 261:             @Override public void run() {
> 262:                 // Run dispose in a background thread and wait for up to
> 263:                 // 5 seconds for it to finish. If it doesn't, then throw 
> an

is 5 seconds a good timeout value?

could it be shorter?
is it possible for the normal shutdown on a real system take more than 5 
seconds?

modules/javafx.graphics/src/main/java/com/sun/javafx/tk/quantum/QuantumToolkit.java
 line 276:

> 274:                 try {
> 275:                     if (!disposeLatch.await(5, TimeUnit.SECONDS)) {
> 276:                         throw new InternalError("dispose timed out");

could this happen on a real system?  if so, should the message be more clear 
and user-friendly?

also, if it can be encountered in the field, it might be better to declare a 
constant and add a javadoc to it explaining what it is.

what do you think?

-------------

PR Review Comment: https://git.openjdk.org/jfx/pull/1574#discussion_r1767694260
PR Review Comment: https://git.openjdk.org/jfx/pull/1574#discussion_r1767695208

Reply via email to