[ https://issues.apache.org/jira/browse/KAFKA-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949363#comment-17949363 ]
Matthias J. Sax commented on KAFKA-17019: ----------------------------------------- {quote}I take a look into producer code to contribute this issue. {quote} Nice! Thank you. {quote}the {{TimeoutException}} *is* the root cause {quote} A timeout only happens when we try to do something, but cannot complete what we want to do within a timeout. So there must be a root cause why we could not complete what we tried to do? – I guess the only real timeout as a root cause is, if we send a request and don't get any response at all, ie, an actual request timeout. {quote}But in that case, hasn’t the {{ProducerBatch}} already raised the real exception, so nothing is actually “missing”? {quote} This depends. If the error is not retriable, yes. The error would be directly re-thrown into the application. However, if the error is retriable, the producer would, well, retry internally (and only log the error), and eventually might give up if some high level timeout expires. For example, I believe `ProducerBatch` could return "not enough replicas" exception, which we would retry internally, until eventually `max.block.ms` expires. {quote}Or do you think the exception raised inside {{ProducerBatch}} should instead be set as the root cause of the subsequent {{{}TimeoutException{}}}? {quote} Yes, that is the idea. I don't know all scenarios from top of my head, and I guess we need to take it on a case-by-case basis. But most `TimeoutException` should have some actual root cause I believe. > Producer TimeoutException should include root cause > --------------------------------------------------- > > Key: KAFKA-17019 > URL: https://issues.apache.org/jira/browse/KAFKA-17019 > Project: Kafka > Issue Type: Improvement > Components: clients, producer > Reporter: Matthias J. Sax > Priority: Major > > With KAFKA-16965 we added a "root cause" to some `TimeoutException` thrown by > the producer. However, it's only a partial solution to address a specific > issue. > We should consider to add the "root cause" for _all_ `TimeoutException` cases > and unify/cleanup the code to get an holistic solution to the problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)