[GitHub] [kafka] C0urante commented on a change in pull request #11524: KAFKA-13469: Block for in-flight record delivery before end-of-life source task offset commit

GitBox Tue, 23 Nov 2021 08:20:38 -0800


C0urante commented on a change in pull request #11524:
URL: https://github.com/apache/kafka/pull/11524#discussion_r755298524




##########
File path: 
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java
##########
@@ -132,6 +144,27 @@ public CommittableOffsets committableOffsets() {
         return new CommittableOffsets(offsets, totalCommittableMessages, 
totalUncommittableMessages, records.size(), largestDequeSize, 
largestDequePartition);
     }
 
+    /**
+     * Wait for all currently in-flight messages to be acknowledged, up to the 
requested timeout.
+     * @param timeout the maximum time to wait
+     * @param timeUnit the time unit of the timeout argument
+     * @return whether all in-flight messages were acknowledged before the 
timeout elapsed
+     */
+    public boolean awaitAllMessages(long timeout, TimeUnit timeUnit) {
+        // Create a new message drain latch as a local variable to avoid 
SpotBugs warnings about inconsistent synchronization
+        // on an instance variable when invoking CountDownLatch::await outside 
a synchronized block
+        CountDownLatch messageDrainLatch;
+        synchronized (this) {
+            messageDrainLatch = new CountDownLatch(numUnackedMessages.get());
+            this.messageDrainLatch = messageDrainLatch;
+        }

Review comment:
       Thanks Randall. I agree that the synchronization here is, even if 
necessary, inelegant, and I hope that we can improve things. But I'm worried 
that the proposal here may be prone to a race condition.
   
   Imagine we restructure the code with your suggestions and the result is this:
   ```java
   
   class SubmittedRecords {
   
       private final AtomicReference messageDrainLatch = new 
AtomicReference<>();
   
       private boolean awaitAllMessages(long timeout, TimeUnit timeUnit) {
           // (2)
           CountDownLatch messageDrainLatch = 
this.messageDrainLatch.updateAndGet(existing -> new 
CountDownLatch(numUnackedMessages.get()));
           try {
               return messageDrainLatch.await(timeout, timeUnit);
           } catch (InterruptedException e) {
               return false;
           }
       }
   
       private void messageAcked() {
           // (1)
           numUnackedMessages.decrementAndGet();
           // (3)
           CountDownLatch messageDrainLatch = this.messageDrainLatch.get();
           if (messageDrainLatch != null) {
               messageDrainLatch.countDown();
           }
       }
   }
   ```
   
   Isn't it still possible that the lines marked `(1)`, `(2)`, and `(3)` could 
execute in that order? And in that case, wouldn't it cause `awaitAllMessages` 
to return early since the `CountDownLatch` created in part `(2)` would use the 
already-decremented value of `numUnackedMessages` after part `(1)` was 
executed, but then also be counted down for the same message in part `(3)`?
   
   FWIW, I originally used a `volatile int` for the `numUnackedMessages` field, 
but got a SpotBugs warning about incrementing a volatile field being a 
non-atomic operation for lines like `numUnackedMessages++;` in 
`SubmittedRecords::submit`. If we synchronize every access to that field, it 
shouldn't matter that increments/decrements are non-atomic, and we can consider 
adding an exemption to `spotbugs-exclude.xml`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] C0urante commented on a change in pull request #11524: KAFKA-13469: Block for in-flight record delivery before end-of-life source task offset commit

Reply via email to