[GitHub] [kafka] gharris1727 commented on pull request #13361: KAFKA-14401: Resume WorkThread if Connector/Tasks reading offsets get stuck when underneath WorkThread dies

via GitHub Wed, 22 Mar 2023 11:17:10 -0700


gharris1727 commented on PR #13361:
URL: https://github.com/apache/kafka/pull/13361#issuecomment-1480048185


   > Since the KafkaBasedLog mainly consumes/produces the Kafka topics, we are 
already retrying the exceptions mentioned above. However, any other runtime 
exception here IMO should be treated as fatal and should be treated 
accordingly. Two options seem obvious:
   
   > We can possibly retry the exception some static number of times with 
exponential back-off and then fail the WorkThread. The exception must be 
propagated back to the Herder and fail the Worker.
   
   I don't think that we should add retries when we already know that the 
exceptions that would be caught here are non-retriable. Additionally, it may be 
unsafe or incorrect to retry on an arbitrary exception, and may produce 
unexpected behavior in the consumer or in the worker.
   
   @rohits64 I also think that the PR as-is does not address the latter point 
that @mukkachaitanya made. We need to propagate these failures to the 
asynchronous callers, and not just let this thread die (with or without 
retries). As-is, this PR does not address the problem in the title.
   
   Thanks for looking into this, it's certainly not good for these failures to 
silently stall the worker indefinitely!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] gharris1727 commented on pull request #13361: KAFKA-14401: Resume WorkThread if Connector/Tasks reading offsets get stuck when underneath WorkThread dies

Reply via email to