chia7712 commented on a change in pull request #8657:
URL: https://github.com/apache/kafka/pull/8657#discussion_r479679930
##########
File path: core/src/main/scala/kafka/server/DelayedOperation.scala
##########
@@ -100,41 +99,22 @@ abstract class DelayedOperation(override val delayMs: Long,
def tryComplete(): Boolean
/**
- * Thread-safe variant of tryComplete() that attempts completion only if the
lock can be acquired
- * without blocking.
*
- * If threadA acquires the lock and performs the check for completion before
completion criteria is met
- * and threadB satisfies the completion criteria, but fails to acquire the
lock because threadA has not
- * yet released the lock, we need to ensure that completion is attempted
again without blocking threadA
- * or threadB. `tryCompletePending` is set by threadB when it fails to
acquire the lock and at least one
- * of threadA or threadB will attempt completion of the operation if this
flag is set. This ensures that
- * every invocation of `maybeTryComplete` is followed by at least one
invocation of `tryComplete` until
- * the operation is actually completed.
+ * There is a long story about using "lock" or "tryLock".
+ *
+ * 1) using lock - There was a lot of cases that a thread holds a group lock
and then it tries to hold more group
+ * locks to complete delayed requests. Unfortunately, the scenario causes
deadlock and so we had introduced the
+ * "tryLock" to avoid deadlock.
+ *
+ * 2) using tryLock - However, the "tryLock" causes another issue that the
delayed requests may be into
+ * oblivion if the thread, which should complete the delayed requests, fails
to get the lock.
+ *
+ * Now, we go back to use "lock" and make sure the thread which tries to
complete delayed requests does NOT hold lock.
+ * The approach is that ReplicaManager collects all actions, which are used
to complete delayed requests, in a queue.
+ * KafkaApis.handle() and the expiration thread for certain delayed
operations (e.g. DelayedJoin) pick up and then
+ * execute an action when no lock is held.
Review comment:
It is dangerous to complete delayed actions by ```DelayedOperation``` as
most methods of ```DelayedOperation``` are executed with locking. We, now,
depend on ```KafkaApi. handler``` to complete the delayed action produced by
someone who can't complete delayed action due to locking. For example,
```DelayedJoin.onComplete``` can produce an new action and the new action can't
be completed by ```DelayedJoin``` itself due to group locking.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]