gharris1727 commented on PR #13361: URL: https://github.com/apache/kafka/pull/13361#issuecomment-1480048185
> Since the KafkaBasedLog mainly consumes/produces the Kafka topics, we are already retrying the exceptions mentioned above. However, any other runtime exception here IMO should be treated as fatal and should be treated accordingly. Two options seem obvious: > We can possibly retry the exception some static number of times with exponential back-off and then fail the WorkThread. The exception must be propagated back to the Herder and fail the Worker. I don't think that we should add retries when we already know that the exceptions that would be caught here are non-retriable. Additionally, it may be unsafe or incorrect to retry on an arbitrary exception, and may produce unexpected behavior in the consumer or in the worker. @rohits64 I also think that the PR as-is does not address the latter point that @mukkachaitanya made. We need to propagate these failures to the asynchronous callers, and not just let this thread die (with or without retries). As-is, this PR does not address the problem in the title. Thanks for looking into this, it's certainly not good for these failures to silently stall the worker indefinitely! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org