A. Sophie Blee-Goldman created KAFKA-12523:
----------------------------------------------

             Summary: Need to improve handling of TimeoutException when 
committing offsets
                 Key: KAFKA-12523
                 URL: https://issues.apache.org/jira/browse/KAFKA-12523
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 2.8.0
            Reporter: A. Sophie Blee-Goldman
             Fix For: 2.8.0


Right now, in TaskManager#commitOffsetsOrTransaction if we  catch a 
TimeoutException then under ALOS we just rethrow it while in EOS we rethrow it 
as TaskCorruptedException. The problem is that commitOffsetsOrTransaction can 
be invoked from several places:
# Commit within StreamThread main processing loop (either user requested or 
commit interval has elapsed: this is presumably the case we had in mind when 
deciding how to handle the TimeoutException in commitOffsetsOrTransaction , no 
problem here
# Clean shutdown of application: a bit weird to throw a TaskCorruptedException 
in this case, but it’ll just end up being caught and forcing a closeDirty, so 
again no problem here
# From TaskManager#handleRevocation: in this case, it’s possible we hit a 
TimeoutException on a task that’s actually being revoked. This exception will 
be saved and rethrown from poll, so under EOS we would catch a 
TaskCorruptedException and then try to revive this task that we actually no 
longer own. Pretty sure this will cause an NPE in the TaskManager. Under ALOS, 
the rethrown TimeoutException will be bubbled up through poll again, but unlike 
TaskCorruptedException we actually don’t catch TimeoutException anywhere in the 
StreamThread loop. This will trigger the uncaught exception handler
# From TaskManager#handleTaskCorrupted:  this method is itself invoked from 
within the catch TaskCorruptedException block of the StreamThread’s runLoop. If 
we throw TaskCorruptedException again then I believe we won’t even catch this 
in the safety net catch Throwable block of the runLoop -- it’ll just be thrown 
directly up through run(). 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to