[ 
https://issues.apache.org/jira/browse/KAFKA-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043060#comment-17043060
 ] 

Guozhang Wang commented on KAFKA-9592:
--------------------------------------

My recommendation would be we first fix this semi-related JIRA: 
https://issues.apache.org/jira/browse/KAFKA-5604. And in close() call, we can 
actually send abortTxn if the producer knows it is still in the middle of a 
transaction, and the producer would ignore the returned response because if the 
error is producer-fenced, then the returned response would doom to fail and the 
producer would not care about it since it is just a best-effort request to let 
the txn to be dropped earlier than later -- similar to the leave-group request, 
which we do not need to care about the response.

And then the semantics (and what we can write in javadoc) becomes:

1. If your processing logic has an error, that try-catch got, then you can call 
abortTxn, and you can still reuse that producer;
2. If the the error is actually from the producer client, then calling other 
APIs than "close" of that producer would get you the same error again; you have 
no way but just close the producer and possibly start a new one.


> Safely abort Producer transactions during application shutdown
> --------------------------------------------------------------
>
>                 Key: KAFKA-9592
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9592
>             Project: Kafka
>          Issue Type: Improvement
>          Components: producer 
>    Affects Versions: 2.5.0
>            Reporter: Boyang Chen
>            Assignee: Xiang Zhang
>            Priority: Major
>              Labels: help-wanted, needs-kip, newbie
>             Fix For: 2.6.0
>
>
> Today if a transactional producer hits a fatal exception, the caller usually 
> catches the exception and handle it by closing the producer, and abort the 
> transaction:
>  
> {code:java}
> try {
>   producer.beginTxn();
>   producer.send(xxx);
>   producer.sendOffsets(xxx);
>   producer.commit();
> } catch (ProducerFenced | UnknownPid e) {
>   ...
>   producer.abortTxn();
>   producer.close();
> }{code}
> This is what the current API suggests user to do. Another scenario is during 
> an informed shutdown, people with EOS producer would also like to end an 
> ongoing transaction before closing the producer as it sounds more clean.
> The tricky scenario is that `abortTxn` is not a safe call when the producer 
> is already in an error state, which means user has to do another try-catch 
> with the first layer catch block, making the error handling pretty annoying. 
> There are several ways to make this API robust and guide user to a safe usage:
>  # Internally abort any ongoing transaction within `producer.close`, and 
> comment on `abortTxn` call to warn user not to do it manually. 
>  # Similar to 1, but get a new `close(boolean abortTxn)` API call in case 
> some users want to handle transaction state by themselves.
>  # Introduce a new abort transaction API with a boolean flag indicating 
> whether the producer is in error state, instead of throwing exceptions
>  # Introduce a public API `isInError` on producer for user to validate before 
> doing any transactional API calls
> I personally favor 1 & 2 most as it is simple and does not require any API 
> change. Considering the change scope, I would still recommend a small KIP.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to