Re: [External] : Re: [POTENTIAL BUG] Potential FIFO violation in BlockingQueue under high contention and suggestion for fair mode in ArrayBlockingQueue and LinkedBlockingQueue

김민주 Tue, 10 Sep 2024 21:33:27 -0700

Hi Archie and Viktor,

I apologize for the delay in my response.




After further consideration, I believe my understanding aligns more closely 
with Interpretation B.

To clarify my understanding of Interpretation B, threads waiting on a Condition 
(linked through ConditionNode in ConditionObject) receive signals in the order 
in which they started waiting, but receiving a signal doesn't guarantee 
immediate reacquisition of the ReentrantLock. Instead, as far as I understand, 
the signal simply enqueues the thread at the end of the AQS 
queue(ReentrantLock).



As a result, threads that received a signal from ConditionObject do not have 
priority over the threads already waiting in the AQS queue and are simply added 
to the end of that queue.

From what I understand, the difference between NonFairSync and FairSync lies in 
whether a thread can acquire the lock when there are other threads already 
waiting in the AQS queue, rather than anything related to the threads signaled 
from a Condition.



Additionally, as I understand it, the enqueue() and doSignal() methods are 
declared final in AQS, and both Syncimplementations use the AQS methods 
directly.



Therefore, I wanted to highlight that this isn't necessarily an issue with AQS 
or FairSync. Instead, I suspect that the behavior we're seeing could arise from 
the fact that threads signaled in a BlockingQueue implementation may not 
execute immediately, which is a characteristic of the implementation.



However, there may be aspects I’m misunderstanding, and I would appreciate any 
corrections or clarifications if that’s the case.



Thank you very much for your time and understanding.

Best regards,

Kim Minju


> 2024. 9. 8. 오후 11:39, Archie Cobbs <archie.co...@gmail.com> 작성:
> 
> Hi Kim,
> 
> Thanks for the details.
> 
> So to summarize:
> Kim is saying that "Interpretation B" is how it actually works.
> Viktor is saying that "Interpretation A" is how it actually works.
> Do I have that right?
> 
> -Archie
> 
> P.S. Viktor: my apologies for misspelling your name before
> 
> 
> On Sat, Sep 7, 2024 at 8:04 PM 김민주 <miiiinj...@gmail.com 
> <mailto:miiiinj...@gmail.com>> wrote:
>> Hi Arhice,
>> 
>> 
>> 
>> First of all, I want to apologize if I may not have explained things clearly 
>> since English is not my first language. I’m really sorry about that.
>> 
>> 
>> 
>> Even so, I deeply appreciate your response and taking the time to reply.
>> 
>> 
>> 
>> First, I would like to confirm whether my understanding is correct.
>> 
>> From what I know, ReentrantLock is based on AQS, and internally, threads are 
>> queued by being linked as Node.
>> 
>> When ReentrantLock.newCondition is called, a ConditionObject is created. 
>> When Condition::await is called, the thread waits based on a ConditionNode, 
>> and the AQS head is replaced with AQS::signalNext, allowing the lock to be 
>> released. At this point, I understand that the node disappears from the AQS 
>> queue and starts waiting within the ConditionNode of the ConditionObject.
>> 
>> As a result, I understand that the waiting places for ReentrantLock and 
>> Condition are different.
>> 
>> To summarize, when Condition::await() is called, the node that was 
>> previously waiting in AQS receives the lock, disappears from the AQS queue, 
>> and starts waiting within the ConditionNode of the ConditionObject.
>> 
>> Additionally, from what I understand, in Condition::doSignal, the first 
>> ConditionNode is removed, and then AQS::enqueue adds it to the tail of AQS.
>> 
>> public class ConditionObject implements Condition, java.io.Serializable {
>> 
>> private void doSignal(ConditionNode first, boolean all) {
>>     while (first != null) {
>>         ConditionNode next = first.nextWaiter;
>>         if ((firstWaiter = next) == null)
>>             lastWaiter = null;
>>         if ((first.getAndUnsetStatus(COND) & COND) != 0) {
>>             enqueue(first);
>>             if (!all)
>>                 break;
>>         }
>>         first = next;
>>     }
>> }
>> 
>> When enqueue is called:
>> 
>> public abstract class AbstractQueuedSynchronizer
>>     extends AbstractOwnableSynchronizer
>> 
>>     /*
>>      * Tail of the wait queue. After initialization, modified only via 
>> casTail.
>>      */
>>     private transient volatile Node tail;
>> 
>>       final void enqueue(Node node) {
>>          if (node != null) {
>>              for (;;) {
>>                  Node t = tail;
>>                  node.setPrevRelaxed(t);        // avoid unnecessary fence
>>                  if (t == null)                 // initialize
>>                      tryInitializeHead();
>>                  else if (casTail(t, node)) {
>>                      t.next = node;
>>                      if (t.status < 0)          // wake up to clean link
>>                          LockSupport.unpark(node.waiter);
>>                      break;
>>                  }
>>              }
>>          }
>> }
>> 
>> From my understanding, this attaches the node to the tail of AQS.
>> 
>> To elaborate further on the situation:
>> 
>> Threads T1 to T10 are waiting on Condition::await() because the queue is 
>> full.
>> 
>> (At this point, T1 to T10 are linked through ConditionNode.) [The AQS queue 
>> is empty, while ConditionNode holds T1 to T10.]
>> 
>> T11 calls take() and holds the lock via lock.lockInterruptibly().
>> 
>> (Since no threads are waiting in the AQS queue, T11 will acquire the lock 
>> immediately.)
>> 
>> [Now, AQS holds T11 at its head, and ConditionNode holds T1 to T10.]
>> 
>> T12 calls queue.put() and enters the wait queue for 
>> lock.lockInterruptibly(). (Since T11 is holding the lock with take(), T12 
>> will be queued behind it in AQS.)
>> 
>> [Now, AQS holds T11 and T12, while ConditionNode holds T1 to T10.]
>> 
>> T11 reduces the count and sends a signal, then releases the lock.
>> 
>> T1 receives the signal and moves to the lock queue. Since ReentrantLock is 
>> in fair mode,
>> 
>> (When T11 sends the signal, T1, the first thread linked in ConditionNode, 
>> will be enqueued via AQS::enqueue. Now, AQS holds T11, T12, and T1, while 
>> ConditionNode holds T2 to T10.)
>> 
>> T11 releases the lock and wakes up T12.
>> 
>> [Now, AQS holds T12 and T1, while ConditionNode holds T2 to T10.]
>> 
>> T12 acquires the lock and proceeds to enqueue in ArrayBlockingQueue without 
>> being blocked by while(count==length).
>> 
>> T12 releases the lock, and the next node in AQS is unparked.
>> 
>> [Now, AQS holds T1, while ConditionNode holds T2 to T10.]
>> 
>> T1, having reacquired the lock after Condition::await(), fails to exit the 
>> while loop and waits again.
>> 
>> [Now, ConditionNode holds T1 and T2 to T10.]
>> 
>> 
>> 
>> This is how I currently understand the situation.
>> 
>> If there are any mistakes in my understanding, I would greatly appreciate 
>> your clarification.
>> 
>> Best Regards,
>> 
>>  Kim Minju
>> 
>> 
>>> 2024. 9. 8. 오전 3:34, Archie Cobbs <archie.co...@gmail.com 
>>> <mailto:archie.co...@gmail.com>> 작성:
>>> 
>>> Hi Kim,
>>> 
>>> On Sat, Sep 7, 2024 at 10:36 AM 김민주 <miiiinj...@gmail.com 
>>> <mailto:miiiinj...@gmail.com>> wrote:
>>>> Here's a clearer outline of the scenario:
>>>> 
>>>> Threads T1 to T10 are waiting on Condition::await() because the queue is 
>>>> full.
>>>> T11 calls take() and holds the lock via lock.lockInterruptibly().
>>>> T12 calls queue.put() and enters the wait queue for 
>>>> lock.lockInterruptibly(). (As I understand, the wait queue for 
>>>> ReentrantLock and Condition are separate.)
>>>> T11 reduces the count and sends a signal, then releases the lock.
>>>> T1 receives the signal and moves to the lock queue. Since the 
>>>> ReentrantLock is in fair mode, T12 (which was already waiting) gets 
>>>> priority, and T1 will acquire the lock later.
>>>> T12 acquires the lock and successfully enqueues.
>>> From one reading of the Javadoc, your step #5 ("T12 gets priority") is not 
>>> supposed to happen that way. Instead, one of T1 through T10 should be 
>>> guaranteed to acquire the lock.
>>> 
>>> Here it is again (from ReentrantLock.newCondition()):
>>> 
>>>> The ordering of lock reacquisition for threads returning from waiting 
>>>> methods is the same as for threads initially acquiring the lock, which is 
>>>> in the default case not specified, but for fair locks favors those threads 
>>>> that have been waiting the longest.
>>> 
>>> 
>>> But part of the problem here is that this documentation is ambiguous.
>>> 
>>> The ambiguity is: are ALL threads trying to acquire the lock, whether on an 
>>> initial attempt or after a condition wakeup, ordered for fairness together 
>>> in one big pool? → In this case step #5 can't happen. Call this 
>>> Interpretation A.
>>> 
>>> Or is this merely saying that threads waiting on a condition reacquire the 
>>> lock based on when they started waiting on the condition, but there are no 
>>> ordering guarantees between those threads and any other unrelated threads 
>>> trying to acquire the lock? → In this case step #5 can happen. Call this 
>>> Interpretation B.
>>> 
>>> So I think we first need to clarify which interpretation is correct here, A 
>>> or B.
>>> 
>>> On that point, Victor said this:
>>> 
>>>> I've re-read ReentrantLock and AQS, and from my understanding on the logic 
>>>> the Condition's place in the wait queue should be maintained, which means 
>>>> that T3 shouldn't be able to "barge".
>>> 
>>> 
>>> So it sounds like Victor is confirming interpretation A. Victor do you 
>>> agree?
>>> 
>>> If so, then it seems like we need to do two things:
>>> 
>>> 1. File a Jira ticket to clarify the Javadoc, e.g. to say something like 
>>> this:
>>> 
>>>> The ordering of lock reacquisition for threads returning from waiting 
>>>> methods is the same as for threads initially acquiring the lock, which is 
>>>> in the default case not specified, but for fair locks favors those threads 
>>>> that have been waiting the longest. In the latter case, the ordering 
>>>> consideration includes all threads attempting to acquire the lock, 
>>>> regardless of whether or not they were previously blocked on the condition.
>>> 
>>> 
>>> 2. Understand why Kim's updated test case is still failing (it must be 
>>> either a bug in the test or a bug in ArrayBlockingQueue).
>>> 
>>> -Archie
>>> 
>>> --
>>> Archie L. Cobbs
>> 
> 
> 
> --
> Archie L. Cobbs

Re: [External] : Re: [POTENTIAL BUG] Potential FIFO violation in BlockingQueue under high contention and suggestion for fair mode in ArrayBlockingQueue and LinkedBlockingQueue

Reply via email to