[ https://issues.apache.org/jira/browse/KAFKA-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043060#comment-17043060 ]
Guozhang Wang commented on KAFKA-9592: -------------------------------------- My recommendation would be we first fix this semi-related JIRA: https://issues.apache.org/jira/browse/KAFKA-5604. And in close() call, we can actually send abortTxn if the producer knows it is still in the middle of a transaction, and the producer would ignore the returned response because if the error is producer-fenced, then the returned response would doom to fail and the producer would not care about it since it is just a best-effort request to let the txn to be dropped earlier than later -- similar to the leave-group request, which we do not need to care about the response. And then the semantics (and what we can write in javadoc) becomes: 1. If your processing logic has an error, that try-catch got, then you can call abortTxn, and you can still reuse that producer; 2. If the the error is actually from the producer client, then calling other APIs than "close" of that producer would get you the same error again; you have no way but just close the producer and possibly start a new one. > Safely abort Producer transactions during application shutdown > -------------------------------------------------------------- > > Key: KAFKA-9592 > URL: https://issues.apache.org/jira/browse/KAFKA-9592 > Project: Kafka > Issue Type: Improvement > Components: producer > Affects Versions: 2.5.0 > Reporter: Boyang Chen > Assignee: Xiang Zhang > Priority: Major > Labels: help-wanted, needs-kip, newbie > Fix For: 2.6.0 > > > Today if a transactional producer hits a fatal exception, the caller usually > catches the exception and handle it by closing the producer, and abort the > transaction: > > {code:java} > try { > producer.beginTxn(); > producer.send(xxx); > producer.sendOffsets(xxx); > producer.commit(); > } catch (ProducerFenced | UnknownPid e) { > ... > producer.abortTxn(); > producer.close(); > }{code} > This is what the current API suggests user to do. Another scenario is during > an informed shutdown, people with EOS producer would also like to end an > ongoing transaction before closing the producer as it sounds more clean. > The tricky scenario is that `abortTxn` is not a safe call when the producer > is already in an error state, which means user has to do another try-catch > with the first layer catch block, making the error handling pretty annoying. > There are several ways to make this API robust and guide user to a safe usage: > # Internally abort any ongoing transaction within `producer.close`, and > comment on `abortTxn` call to warn user not to do it manually. > # Similar to 1, but get a new `close(boolean abortTxn)` API call in case > some users want to handle transaction state by themselves. > # Introduce a new abort transaction API with a boolean flag indicating > whether the producer is in error state, instead of throwing exceptions > # Introduce a public API `isInError` on producer for user to validate before > doing any transactional API calls > I personally favor 1 & 2 most as it is simple and does not require any API > change. Considering the change scope, I would still recommend a small KIP. > -- This message was sent by Atlassian Jira (v8.3.4#803005)