[ 
https://issues.apache.org/jira/browse/KAFKA-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949363#comment-17949363
 ] 

Matthias J. Sax commented on KAFKA-17019:
-----------------------------------------

{quote}I take a look into producer code to contribute this issue. 
{quote}
Nice! Thank you.
{quote}the {{TimeoutException}} *is* the root cause
{quote}
A timeout only happens when we try to do something, but cannot complete what we 
want to do within a timeout. So there must be a root cause why we could not 
complete what we tried to do? – I guess the only real timeout as a root cause 
is, if we send a request and don't get any response at all, ie, an actual 
request timeout.
{quote}But in that case, hasn’t the {{ProducerBatch}} already raised the real 
exception, so nothing is actually “missing”?
{quote}
This depends. If the error is not retriable, yes. The error would be directly 
re-thrown into the application. However, if the error is retriable, the 
producer would, well, retry internally (and only log the error), and eventually 
might give up if some high level timeout expires. For example, I believe 
`ProducerBatch` could return "not enough replicas" exception, which we would 
retry internally, until eventually `max.block.ms` expires.
{quote}Or do you think the exception raised inside {{ProducerBatch}} should 
instead be set as the root cause of the subsequent {{{}TimeoutException{}}}?
{quote}
Yes, that is the idea.

I don't know all scenarios from top of my head, and I guess we need to take it 
on a case-by-case basis. But most `TimeoutException` should have some actual 
root cause I believe.

> Producer TimeoutException should include root cause
> ---------------------------------------------------
>
>                 Key: KAFKA-17019
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17019
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, producer 
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> With KAFKA-16965 we added a "root cause" to some `TimeoutException` thrown by 
> the producer. However, it's only a partial solution to address a specific 
> issue.
> We should consider to add the "root cause" for _all_ `TimeoutException` cases 
> and unify/cleanup the code to get an holistic solution to the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to