Thanks for the clarification Guozhang, I got your point that we want to have a consistent handling of fatal exceptions being thrown from the abortTxn. I modified the current template to move the fatal exception try-catch outside of the processing loop to make sure we could get a chance to close consumer/producer modules. Let me know what you think.
Best, Boyang On Fri, Jan 22, 2021 at 11:05 AM Boyang Chen <reluctanthero...@gmail.com> wrote: > My understanding is that abortTransaction would only throw when the > producer is in fatal state. For CommitFailed, the producer should still be > in the abortable error state, so that abortTransaction call would not throw. > > On Fri, Jan 22, 2021 at 11:02 AM Guozhang Wang <wangg...@gmail.com> wrote: > >> Or are you going to maintain some internal state such that, the >> `abortTransaction` in the catch block would never throw again? >> >> On Fri, Jan 22, 2021 at 11:01 AM Guozhang Wang <wangg...@gmail.com> >> wrote: >> >> > Hi Boyang/Jason, >> > >> > I've also thought about this (i.e. using CommitFailed for all >> non-fatal), >> > but what I'm pondering is that, in the catch (CommitFailed) block, what >> > would happen if the `producer.abortTransaction();` throws again? should >> > that be captured as a fatal and cause the client to close again. >> > >> > If yes, then naively the pattern would be: >> > >> > ... >> > catch (CommitFailedException e) { >> > // Transaction commit failed with abortable error, user could >> reset >> > // the application state and resume with a new transaction. The >> > root >> > // cause was wrapped in the thrown exception. >> > resetToLastCommittedPositions(consumer); >> > try { >> > producer.abortTransaction(); >> > } catch (KafkaException e) { >> > producer.close(); >> > consumer.close(); >> > throw e; >> > } >> > } catch (KafkaException e) { >> > producer.close(); >> > consumer.close(); >> > throw e; >> > } >> > ... >> > >> > Guozhang >> > >> > On Fri, Jan 22, 2021 at 10:47 AM Boyang Chen < >> reluctanthero...@gmail.com> >> > wrote: >> > >> >> Hey Guozhang, >> >> >> >> Jason and I were discussing the new API offline and decided to take >> >> another >> >> approach. Firstly, the reason not to invent a new API with returned >> >> boolean >> >> flag is for compatibility consideration, since old EOS code would not >> know >> >> that a given transaction commit was failed internally as they don't >> listen >> >> to the output flag. Our proposed alternative solution is to let >> >> *commitTransaction >> >> throw CommitFailedException whenever the commit failed with non-fatal >> >> exception*, so that on the caller side the handling logic becomes: >> >> >> >> try { >> >> if (shouldCommit) { >> >> producer.commitTransaction(); >> >> } else { >> >> resetToLastCommittedPositions(consumer); >> >> producer.abortTransaction(); >> >> } >> >> } catch (CommitFailedException e) { >> >> // Transaction commit failed with abortable error, user could >> >> reset >> >> // the application state and resume with a new transaction. The >> >> root >> >> // cause was wrapped in the thrown exception. >> >> resetToLastCommittedPositions(consumer); >> >> producer.abortTransaction(); >> >> } catch (KafkaException e) { >> >> producer.close(); >> >> consumer.close(); >> >> throw e; >> >> } >> >> >> >> This approach looks cleaner as all exception types other than >> CommitFailed >> >> will doom to be fatal, which is very easy to adopt for users. In the >> >> meantime, we still maintain the commitTxn behavior to throw instead of >> >> silently failing. >> >> >> >> In addition, we decided to drop the recommendation to handle >> >> TimeoutException and leave it to the users to make the call. The >> downside >> >> for blindly calling abortTxn upon timeout is that we could result in an >> >> illegal state when the commit was already successful on the broker >> >> side. Without a good guarantee on the outcome, overcomplicating the >> >> template should not be encouraged IMHO. >> >> >> >> Let me know your thoughts on the new approach here, thank you! >> >> >> >> Best, >> >> Boyang >> >> >> >> On Tue, Jan 19, 2021 at 11:11 AM Guozhang Wang <wangg...@gmail.com> >> >> wrote: >> >> >> >> > Thanks for your clarification on 2)/3), that makes sense. >> >> > >> >> > On Tue, Jan 19, 2021 at 10:16 AM Boyang Chen < >> >> reluctanthero...@gmail.com> >> >> > wrote: >> >> > >> >> > > Thanks for the input Guozhang, replied inline. >> >> > > >> >> > > On Mon, Jan 18, 2021 at 8:57 PM Guozhang Wang <wangg...@gmail.com> >> >> > wrote: >> >> > > >> >> > > > Hello Boyang, >> >> > > > >> >> > > > Thanks for the updated KIP. I read it again and have the >> following >> >> > > > thoughts: >> >> > > > >> >> > > > 0. I'm a bit concerned that if commitTxn does not throw any >> >> non-fatal >> >> > > > exception, and instead we rely on the subsequent beginTxn call to >> >> > throw, >> >> > > it >> >> > > > may violate some callers with a pattern that relying on >> commitTxn to >> >> > > > succeed to make some non-rollback operations. For example: >> >> > > > >> >> > > > beginTxn() >> >> > > > // do some read-write on my local DB >> >> > > > commitTxn() >> >> > > > // if commitTxn succeeds, then commit the DB >> >> > > > >> >> > > > ------------- >> >> > > > >> >> > > > The issue is that, committing DB is a non-rollback operation, and >> >> users >> >> > > may >> >> > > > just rely on commitTxn to return without error to make this >> >> > non-rollback >> >> > > > call. Of course we can just claim this pattern is not legitimate >> >> and is >> >> > > not >> >> > > > the right way of doing things, but many users may naturally adopt >> >> this >> >> > > > pattern. >> >> > > > >> >> > > > So maybe we should still let commitTxn also throw non-fatal >> >> exceptions, >> >> > > in >> >> > > > which case we would then call abortTxn again. >> >> > > > >> >> > > > But if we do this, the pattern becomes: >> >> > > > >> >> > > > try { >> >> > > > beginTxn() >> >> > > > // do something >> >> > > > } catch (Exception) { >> >> > > > shouldCommit = false; >> >> > > > } >> >> > > > >> >> > > > if (shouldCommit) { >> >> > > > try { >> >> > > > commitTxn() >> >> > > > } catch (...) { // enumerate all fatal exceptions >> >> > > > shutdown() >> >> > > > } catch (KafkaException) { >> >> > > > // non-fatal >> >> > > > shouldCommit = false; >> >> > > > } >> >> > > > } >> >> > > > >> >> > > > if (!shouldCommit && running()) { >> >> > > > try { >> >> > > > abortTxn() >> >> > > > } catch (KafkaException) { >> >> > > > // only throw fatal >> >> > > > shutdown() >> >> > > > } >> >> > > > } >> >> > > > >> >> > > > --------------------- >> >> > > > >> >> > > > Which is much more complicated. >> >> > > > >> >> > > > Thank makes me think, the alternative we have discussed offline >> may >> >> be >> >> > > > better: let commitTxn() to return a boolean flag. >> >> > > > >> >> > > > * If it returns true, it means the commit succeeded. Users can >> >> > > comfortably >> >> > > > continue and do any external non-rollback operations if they >> like. >> >> > > > * If it returns false, it means the commit failed with non-fatal >> >> error >> >> > > *AND >> >> > > > the txn has been aborted*. Users do not need to call abortTxn >> again. >> >> > > > * If it throws, then it means fatal errors. Users should shut >> down >> >> the >> >> > > > client. >> >> > > > >> >> > > > In this case, the pattern becomes: >> >> > > > >> >> > > > try { >> >> > > > beginTxn() >> >> > > > // do something >> >> > > > } catch (Exception) { >> >> > > > shouldCommit = false; >> >> > > > } >> >> > > > >> >> > > > try { >> >> > > > if (shouldCommit) { >> >> > > > commitSucceeded = commitTxn() >> >> > > > } else { >> >> > > > // reset offsets, rollback operations, etc >> >> > > > abortTxn() >> >> > > > } >> >> > > > } catch (KafkaException) { >> >> > > > // only throw fatal >> >> > > > shutdown() >> >> > > > } >> >> > > > >> >> > > > if (commitSucceeded) >> >> > > > // do other non-rollbackable things >> >> > > > else >> >> > > > // reset offsets, rollback operations, etc >> >> > > > >> >> > > > ------------------------- >> >> > > > >> >> > > > Of course, if we want to have better visibility into what caused >> the >> >> > > commit >> >> > > > to fail and txn to abort. We can let the return type be an enum >> >> instead >> >> > > of >> >> > > > a flag. But my main idea is to still let the commitTxn be the >> final >> >> > point >> >> > > > users can tell whether this txn succeeded or not, instead of >> >> relying on >> >> > > the >> >> > > > next beginTxn() call. >> >> > > > >> >> > > > I agree that adding a boolean flag is indeed useful in this case. >> >> Will >> >> > > update the KIP. >> >> > > >> >> > > 1. Re: "while maintaining the behavior to throw fatal exception in >> >> raw" >> >> > not >> >> > > > sure what do you mean by "throw" here. Are you proposing the >> >> callback >> >> > > would >> >> > > > *pass* (not throw) in any fatal exceptions when triggered without >> >> > > wrapping? >> >> > > > >> >> > > > Yes, I want to say *pass*, the benefit is to make the end user's >> >> > > expectation consistent >> >> > > regarding exception handling. >> >> > > >> >> > > >> >> > > > 2. I'm not sure I understand the claim regarding the callback >> that >> >> "In >> >> > > EOS >> >> > > > setup, it is not required to handle the exception". Are you >> >> proposing >> >> > > that, >> >> > > > e.g. in Streams, we do not try to handle any exceptions if EOS is >> >> > enabled >> >> > > > in the callback anymore, but just let commitTxn() itself to fail >> to >> >> be >> >> > > > notified about the problem? Personally I think in Streams we >> should >> >> > just >> >> > > > make the handling logic of the callback to be consistent >> regardless >> >> of >> >> > > the >> >> > > > EOS settings (today we have different logic depending on this >> logic, >> >> > > which >> >> > > > I think could be unified as well). >> >> > > > >> >> > > > My idea originates from the claim on send API: >> >> > > "When used as part of a transaction, it is not necessary to define >> a >> >> > > callback or check the result of the future in order to detect >> errors >> >> > from >> >> > > <code>send</code>. " >> >> > > My understanding is that for EOS, the exception will be detected by >> >> > calling >> >> > > transactional APIs either way, so it is a duplicate handling to >> track >> >> > > all the sendExceptions in RecordCollector. However, I looked up >> >> > > sendException is being used today as follow: >> >> > > >> >> > > 1. Pass to "ProductionExceptionHandler" for any default or >> customized >> >> > > exception handler to handle >> >> > > 2. Stop collecting offset info or new exceptions >> >> > > 3. Check and rethrow exceptions in close(), flush() or new send() >> >> calls >> >> > > >> >> > > To my understanding, we should still honor the commitment to #1 for >> >> any >> >> > > user customized implementation. The #2 does not really affect our >> >> > decision >> >> > > upon EOS. The #3 here is still valuable as it could help us fail >> fast >> >> in >> >> > > new send() instead of waiting to later stage of processing. In that >> >> > sense, >> >> > > I agree we should continue to record send exceptions even under EOS >> >> case >> >> > to >> >> > > ensure the strength of stream side Producer logic. On the safer >> side, >> >> we >> >> > no >> >> > > longer need to wrap certain fatal exceptions like ProducerFenced as >> >> > > TaskMigrated, since they should not crash the stream thread if >> thrown >> >> in >> >> > > raw format, once we adopt the new processing model in the send >> phase. >> >> > > >> >> > > >> >> > > > >> >> > > > Guozhang >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Thu, Dec 17, 2020 at 8:42 PM Boyang Chen < >> >> > reluctanthero...@gmail.com> >> >> > > > wrote: >> >> > > > >> >> > > > > Thanks for everyone's feedback so far. I have polished the KIP >> >> after >> >> > > > > offline discussion with some folks working on EOS to make the >> >> > exception >> >> > > > > handling more lightweight. The essential change is that we are >> not >> >> > > > > inventing a new intermediate exception type, but instead >> >> separating a >> >> > > > full >> >> > > > > transaction into two phases: >> >> > > > > >> >> > > > > 1. The data transmission phase >> >> > > > > 2. The commit phase >> >> > > > > >> >> > > > > This way for any exception thrown from phase 1, will be an >> >> indicator >> >> > in >> >> > > > > phase 2 whether we should do commit or abort, and from now on >> >> > > > > `commitTransaction` should only throw fatal exceptions, >> similar to >> >> > > > > `abortTransaction`, so that any KafkaException caught in the >> >> commit >> >> > > phase >> >> > > > > will be definitely fatal to crash the app. For more advanced >> users >> >> > such >> >> > > > as >> >> > > > > Streams, we have the ability to further wrap selected types of >> >> fatal >> >> > > > > exceptions to trigger task migration if necessary. >> >> > > > > >> >> > > > > More details in the KIP, feel free to take another look, >> thanks! >> >> > > > > >> >> > > > > On Thu, Dec 17, 2020 at 8:36 PM Boyang Chen < >> >> > > reluctanthero...@gmail.com> >> >> > > > > wrote: >> >> > > > > >> >> > > > > > Thanks Bruno for the feedback. >> >> > > > > > >> >> > > > > > On Mon, Dec 7, 2020 at 5:26 AM Bruno Cadonna < >> >> br...@confluent.io> >> >> > > > wrote: >> >> > > > > > >> >> > > > > >> Thanks Boyang for the KIP! >> >> > > > > >> >> >> > > > > >> Like Matthias, I do also not know the producer internal well >> >> > enough >> >> > > to >> >> > > > > >> comment on the categorization. However, I think having a >> super >> >> > > > exception >> >> > > > > >> (e.g. RetriableException) that encodes if an exception is >> >> fatal >> >> > or >> >> > > > not >> >> > > > > >> is cleaner, better understandable and less error-prone, >> because >> >> > > > ideally >> >> > > > > >> when you add a new non-fatal exception in future you just >> need >> >> to >> >> > > > think >> >> > > > > >> about letting it inherit from the super exception and all >> the >> >> rest >> >> > > of >> >> > > > > >> the code will just behave correctly without the need to wrap >> >> the >> >> > new >> >> > > > > >> exception into another exception each time it is thrown >> (maybe >> >> it >> >> > is >> >> > > > > >> thrown at different location in the code). >> >> > > > > >> >> >> > > > > >> As far as I understand the following statement from your >> >> previous >> >> > > > e-mail >> >> > > > > >> is the reason that currently such a super exception is not >> >> > possible: >> >> > > > > >> >> >> > > > > >> "Right now we have RetriableException type, if we are going >> to >> >> > add a >> >> > > > > >> `ProducerRetriableException` type, we have to put this new >> >> > interface >> >> > > > as >> >> > > > > >> the parent of the RetriableException, because not all thrown >> >> > > non-fatal >> >> > > > > >> exceptions are `retriable` in general for producer" >> >> > > > > >> >> >> > > > > >> >> >> > > > > >> In the list of exceptions in your KIP, I found non-fatal >> >> > exceptions >> >> > > > that >> >> > > > > >> do not inherit from RetriableException. I guess those are >> the >> >> ones >> >> > > you >> >> > > > > >> are referring to in your statement: >> >> > > > > >> >> >> > > > > >> InvalidProducerEpochException >> >> > > > > >> InvalidPidMappingException >> >> > > > > >> TransactionAbortedException >> >> > > > > >> >> >> > > > > >> All of those exceptions are non-fatal and do not inherit >> from >> >> > > > > >> RetriableException. Is there a reason for that? If they >> >> depended >> >> > > from >> >> > > > > >> RetriableException we would be a bit closer to a super >> >> exception I >> >> > > > > >> mention above. >> >> > > > > >> >> >> > > > > >> The reason is that sender may catch those exceptions in the >> >> > > > > > ProduceResponse, and it currently does infinite >> >> > > > > > retries on RetriableException. To make sure we could trigger >> the >> >> > > > > > abortTransaction() by catching non-fatal thrown >> >> > > > > > exceptions and reinitialize the txn state, we chose not to >> let >> >> > those >> >> > > > > > exceptions inherit RetriableException so that >> >> > > > > > they won't cause infinite retry on sender. >> >> > > > > > >> >> > > > > > >> >> > > > > >> With OutOfOrderSequenceException and >> >> UnknownProducerIdException, I >> >> > > > think >> >> > > > > >> to understand that their fatality depends on the type (i.e. >> >> > > > > >> configuration) of the producer. That makes it difficult to >> >> have a >> >> > > > super >> >> > > > > >> exception that encodes the retriability as mentioned above. >> >> Would >> >> > it >> >> > > > be >> >> > > > > >> possible to introduce exceptions that inherit from >> >> > > RetriableException >> >> > > > > >> and that are thrown when those exceptions are caught from >> the >> >> > > brokers >> >> > > > > >> and the type of the producer is such that the exceptions are >> >> > > > retriable? >> >> > > > > >> >> >> > > > > >> Yea, I think in general the exception type mixing is >> difficult >> >> to >> >> > > get >> >> > > > > > them all right. I have already proposed another solution >> based >> >> on >> >> > my >> >> > > > > > offline discussion with some folks working on EOS >> >> > > > > > to make the handling more straightforward for end users >> without >> >> the >> >> > > > need >> >> > > > > > to distinguish exception fatality. >> >> > > > > > >> >> > > > > >> Best, >> >> > > > > >> Bruno >> >> > > > > >> >> >> > > > > >> >> >> > > > > >> On 04.12.20 19:34, Guozhang Wang wrote: >> >> > > > > >> > Thanks Boyang for the proposal! I made a pass over the >> list >> >> and >> >> > > here >> >> > > > > are >> >> > > > > >> > some thoughts: >> >> > > > > >> > >> >> > > > > >> > 0) Although this is not part of the public API, I think we >> >> > should >> >> > > > make >> >> > > > > >> sure >> >> > > > > >> > that the suggested pattern (i.e. user can always call >> >> abortTxn() >> >> > > > when >> >> > > > > >> > handling non-fatal errors) are indeed supported. E.g. if >> the >> >> txn >> >> > > is >> >> > > > > >> already >> >> > > > > >> > aborted by the broker side, then users can still call >> >> abortTxn >> >> > > which >> >> > > > > >> would >> >> > > > > >> > not throw another exception but just be treated as a >> no-op. >> >> > > > > >> > >> >> > > > > >> > 1) *ConcurrentTransactionsException*: I think this error >> can >> >> > also >> >> > > be >> >> > > > > >> > returned but not documented yet. This should be a >> non-fatal >> >> > error. >> >> > > > > >> > >> >> > > > > >> > 2) *InvalidTxnStateException*: this error is returned from >> >> > broker >> >> > > > when >> >> > > > > >> txn >> >> > > > > >> > state transition failed (e.g. it is trying to transit to >> >> > > > > complete-commit >> >> > > > > >> > while the current state is not prepare-commit). This error >> >> could >> >> > > > > >> indicates >> >> > > > > >> > a bug on the client internal code or the broker code, OR a >> >> user >> >> > > > error >> >> > > > > >> --- a >> >> > > > > >> > similar error is ConcurrentTransactionsException, i.e. if >> >> Kafka >> >> > is >> >> > > > > >> bug-free >> >> > > > > >> > these exceptions would only be returned if users try to do >> >> > > something >> >> > > > > >> wrong, >> >> > > > > >> > e.g. calling abortTxn right after a commitTxn, etc. So I'm >> >> > > thinking >> >> > > > it >> >> > > > > >> > should be a non-fatal error instead of a fatal error, >> wdyt? >> >> > > > > >> > >> >> > > > > >> > 3) *KafkaException*: case i "indicates fatal transactional >> >> > > sequence >> >> > > > > >> > (Fatal)", this is a bit conflicting with the >> >> > > > *OutOfSequenceException* >> >> > > > > >> that >> >> > > > > >> > is treated as non-fatal. I guess your proposal is that >> >> > > > > >> > OutOfOrderSequenceException would be treated either as >> fatal >> >> > with >> >> > > > > >> > transactional producer, or non-fatal with idempotent >> >> producer, >> >> > is >> >> > > > that >> >> > > > > >> > right? If the producer is only configured with idempotency >> >> but >> >> > not >> >> > > > > >> > transaction, then throwing a >> >> TransactionStateCorruptedException >> >> > > for >> >> > > > > >> > non-fatal errors would be confusing since users are not >> using >> >> > > > > >> transactions >> >> > > > > >> > at all.. So I suggest we always throw >> OutOfSequenceException >> >> > as-is >> >> > > > > (i.e. >> >> > > > > >> > not wrapped) no matter how the producer is configured, and >> >> let >> >> > the >> >> > > > > >> caller >> >> > > > > >> > decide how to handle it based on whether it is only >> >> idempotent >> >> > or >> >> > > > > >> > transactional itself. >> >> > > > > >> > >> >> > > > > >> > 4) Besides all the txn APIs, the `send()` callback / >> future >> >> can >> >> > > also >> >> > > > > >> throw >> >> > > > > >> > txn-related exceptions, I think this KIP should also cover >> >> this >> >> > > API >> >> > > > as >> >> > > > > >> well? >> >> > > > > >> > >> >> > > > > >> > 5) This is related to 1/2) above: sometimes those >> non-fatal >> >> > errors >> >> > > > > like >> >> > > > > >> > ConcurrentTxn or InvalidTxnState are not due to the state >> >> being >> >> > > > > >> corrupted >> >> > > > > >> > at the broker side, but maybe users are doing something >> >> wrong. >> >> > So >> >> > > > I'm >> >> > > > > >> > wondering if we should further distinguish those non-fatal >> >> > errors >> >> > > > > >> between >> >> > > > > >> > a) those that are caused by Kafka itself, e.g. a broker >> timed >> >> > out >> >> > > > and >> >> > > > > >> > aborted a txn and later an endTxn request is received, >> and b) >> >> > the >> >> > > > > user's >> >> > > > > >> > API call pattern is incorrect, causing the request to be >> >> > rejected >> >> > > > with >> >> > > > > >> an >> >> > > > > >> > error code from the broker. >> >> *TransactionStateCorruptedException* >> >> > > > feels >> >> > > > > >> to >> >> > > > > >> > me more like for case a), but not case b). >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > Guozhang >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > On Wed, Dec 2, 2020 at 4:50 PM Boyang Chen < >> >> > > > > reluctanthero...@gmail.com> >> >> > > > > >> > wrote: >> >> > > > > >> > >> >> > > > > >> >> Thanks Matthias, I think your proposal makes sense as >> well, >> >> on >> >> > > the >> >> > > > > pro >> >> > > > > >> side >> >> > > > > >> >> we could have a universally agreed exception type to be >> >> caught >> >> > by >> >> > > > the >> >> > > > > >> user, >> >> > > > > >> >> without having an extra layer on top of the actual >> >> exceptions. >> >> > I >> >> > > > > could >> >> > > > > >> see >> >> > > > > >> >> some issue on downsides: >> >> > > > > >> >> >> >> > > > > >> >> 1. The exception hierarchy will be more complex. Right >> now >> >> we >> >> > > have >> >> > > > > >> >> RetriableException type, if we are going to add a >> >> > > > > >> >> `ProducerRetriableException` type, we have to put this >> new >> >> > > > interface >> >> > > > > >> as the >> >> > > > > >> >> parent of the RetriableException, because not all thrown >> >> > > non-fatal >> >> > > > > >> >> exceptions are `retriable` in general for producer, for >> >> example >> >> > > > > >> >> < >> >> > > > > >> >> >> >> > > > > >> >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://github.com/apache/kafka/blob/e275742f850af4a1b79b0d1bd1ac9a1d2e89c64e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L745 >> >> > > > > >> >>> . >> >> > > > > >> >> We could have an empty interface >> >> `ProducerRetriableException` >> >> > to >> >> > > > let >> >> > > > > >> all >> >> > > > > >> >> the thrown exceptions implement for sure, but it's a bit >> >> > > unorthodox >> >> > > > > in >> >> > > > > >> the >> >> > > > > >> >> way we deal with exceptions in general. >> >> > > > > >> >> >> >> > > > > >> >> 2. There are cases where we throw a KafkaException >> wrapping >> >> > > another >> >> > > > > >> >> KafkaException as either fatal or non-fatal. If we use an >> >> > > interface >> >> > > > > to >> >> > > > > >> >> solve #1, it is also required to implement another >> bloated >> >> > > > exception >> >> > > > > >> class >> >> > > > > >> >> which could replace KafkaException type, as we couldn't >> mark >> >> > > > > >> KafkaException >> >> > > > > >> >> as retriable for sure. >> >> > > > > >> >> >> >> > > > > >> >> 3. In terms of the encapsulation, wrapping means we could >> >> limit >> >> > > the >> >> > > > > >> scope >> >> > > > > >> >> of affection to the producer only, which is important >> since >> >> we >> >> > > > don't >> >> > > > > >> want >> >> > > > > >> >> shared exception types to implement a producer-related >> >> > interface, >> >> > > > > such >> >> > > > > >> >> as UnknownTopicOrPartitionException. >> >> > > > > >> >> >> >> > > > > >> >> Best, >> >> > > > > >> >> Boyang >> >> > > > > >> >> >> >> > > > > >> >> On Wed, Dec 2, 2020 at 3:38 PM Matthias J. Sax < >> >> > mj...@apache.org >> >> > > > >> >> > > > > >> wrote: >> >> > > > > >> >> >> >> > > > > >> >>> Thanks for the KIP Boyang! >> >> > > > > >> >>> >> >> > > > > >> >>> Overall, categorizing exceptions makes a lot of sense. >> As I >> >> > > don't >> >> > > > > know >> >> > > > > >> >>> the producer internals well enough, I cannot comment on >> the >> >> > > > > >> >>> categorization in detail though. >> >> > > > > >> >>> >> >> > > > > >> >>> What I am wondering is, if we should introduce an >> exception >> >> > > > > interface >> >> > > > > >> >>> that non-fatal exception would implement instead of >> >> creating a >> >> > > new >> >> > > > > >> class >> >> > > > > >> >>> that will wrap non-fatal exceptions? What would be the >> >> > pros/cons >> >> > > > for >> >> > > > > >> >>> both designs? >> >> > > > > >> >>> >> >> > > > > >> >>> >> >> > > > > >> >>> -Matthias >> >> > > > > >> >>> >> >> > > > > >> >>> >> >> > > > > >> >>> On 12/2/20 11:35 AM, Boyang Chen wrote: >> >> > > > > >> >>>> Hey there, >> >> > > > > >> >>>> >> >> > > > > >> >>>> I would like to start a discussion thread for KIP-691: >> >> > > > > >> >>>> >> >> > > > > >> >>> >> >> > > > > >> >> >> >> > > > > >> >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-691%3A+Enhance+Transactional+Producer+Exception+Handling >> >> > > > > >> >>>> >> >> > > > > >> >>>> The KIP is aiming to simplify the exception handling >> logic >> >> > for >> >> > > > > >> >>>> transactional Producer users by classifying fatal and >> >> > non-fatal >> >> > > > > >> >>> exceptions >> >> > > > > >> >>>> and throw them correspondingly for easier catch and >> retry. >> >> > Let >> >> > > me >> >> > > > > >> know >> >> > > > > >> >>> what >> >> > > > > >> >>>> you think. >> >> > > > > >> >>>> >> >> > > > > >> >>>> Best, >> >> > > > > >> >>>> Boyang >> >> > > > > >> >>>> >> >> > > > > >> >>> >> >> > > > > >> >> >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > -- Guozhang >> >> > > > >> >> > > >> >> > >> >> > >> >> > -- >> >> > -- Guozhang >> >> > >> >> >> > >> > >> > -- >> > -- Guozhang >> > >> >> >> -- >> -- Guozhang >> >