chia7712 commented on pull request #8657:
URL: https://github.com/apache/kafka/pull/8657#issuecomment-670293057
> Could you provide a bit more detail?
The new deadlock you mentioned is caused by PR. This PR introduces extra
lock (```ReentrantLock```) to ```DelayedJoin``` (By contrast, previous
```DelayedJoin``` has a group lock only). The thread has to get two locks
(group lock and ReentrantLock) to complete ```DelayedJoin``` and so deadlock
happens when two locks are held by different thread. Hence, my approach is to
remove the extra lock introduced by this PR and so deadlock will be gone.
The changes to ```DelayedJoin``` in this PR is to resolve deadlock happening
on ```DelayedJoin#onComplete```. ```onComplete``` has to get other group lock
to resolve delayed request but it is executed within a group lock. In order to
resolve deadlock, this PR tries to move ```onComplete``` out of group lock.
However, after thinking about this a bit more, why ```onComplete``` is executed
within a lock? It is thread-safe already and it is not always executed within a
lock (if it is executed on timeout). Hence, the implementation of my approach
is to refactor ```DelayedOperation``` to move ```onComplete``` out of
unnecessary lock. The refactor includes following changes.
1. ```forceComplete``` should be executed by ```DelayedOperation``` other
than subclasss
1. ```forceComplete``` is executed after releasing lock
```scala
private[server] def maybeTryComplete(): Boolean = {
lock.lock
val timeToComplete = try tryComplete() finally lock.unlock
timeToComplete && forceComplete()
}
```
3. ```forceComplete``` is renamed to ```canComplete```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]