On Thu, 15 Aug 2024 18:22:54 GMT, Viktor Klang <vkl...@openjdk.org> wrote:
> My suspicion is that Condition::await() throws before having successfully > reacquired the lock, and this exception is swallowed because Lock::unlock() > then throws when invoke with an IllegalMonitorStateException as the current > thread was not reestablished as the owner. > > So this PR is intended to harden the reacquisition of await-ing methods in > AQS and AQLS so that instead of throwing they spin-loop trying to reacquire > the lockāat least a thread dump would show where the Thread is stuck trying > to reacquire. > > There's also the option of attempting to log a JFR event on the first > reacquisition failure, but that might need to be done with a pre-created > event as the cause of the failing acquisition may be an OOME. > > I also needed to harden the TestDisposerRace itself, as it currently tries to > provoke OOMEs but then proceeds to want to allocate objects used for the test > itself. I'm not super-happy about the changes there, but they should be safer > than before. > > The first RuntimeException/Error encountered will be rethrown upon success to > facilitate debugging. @DougLea @AlanBateman I'm currently trying to provoke the test to fail with this new implementation. @DougLea @AlanBateman This seems to fix TestDisposerRace.java Ready for review Closing this PR and opening a new one, to see if that helps. Gah, so it had nothing to do with the commit message. There was, for some reason, a tab in the JBS title which was copy-pasted in as the PR title. src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 286: > 284: */ > 285: private final void reacquire(Node node, long arg) { > 286: for (long nanos = 1L;;) { @AlanBateman @DougLea If we're concerned about SOE due to adding another stackframe by invoking reacquire we might consider ForceInline, but I was weary about doing so. src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 290: > 288: acquire(node, arg, false, false, false, 0L); > 289: break; > 290: } catch (Error | RuntimeException ex) { @DougLea @AlanBateman The first question is if this set of exception are the way to go, or we should go with Throwable src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 294: > 292: if (first) { > 293: first = false; > 294: // Log JFR event? @DougLea @AlanBateman The second question is whether we should declare a JFR event for this use-case or not. src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 312: > 310: } > 311: } > 312: } @DougLea @AlanBateman So I've added such that the first exception/error is rethrown after subsequent successful reacquisition. See above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2292008244 PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2293131396 PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2296564889 PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2325963172 PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2326008569 PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1719777654 PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1718867228 PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1718867694 PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1731486424