Why we are receiving all emails kindly remove us from dev@kafka.apache.org we don't want to receive emails anymore.
Thanks > On 01/23/2021 4:14 AM Guozhang Wang <wangg...@gmail.com> wrote: > > > Thanks Boyang, yes I think I was confused about the different handling of > two abortTxn calls, and now I get it was not intentional. I think I do not > have more concerns. > > On Fri, Jan 22, 2021 at 1:12 PM Boyang Chen <reluctanthero...@gmail.com> > wrote: > > > Thanks for the clarification Guozhang, I got your point that we want to > > have a consistent handling of fatal exceptions being thrown from the > > abortTxn. I modified the current template to move the fatal exception > > try-catch outside of the processing loop to make sure we could get a chance > > to close consumer/producer modules. Let me know what you think. > > > > Best, > > Boyang > > > > On Fri, Jan 22, 2021 at 11:05 AM Boyang Chen <reluctanthero...@gmail.com> > > wrote: > > > > > My understanding is that abortTransaction would only throw when the > > > producer is in fatal state. For CommitFailed, the producer should still > > be > > > in the abortable error state, so that abortTransaction call would not > > throw. > > > > > > On Fri, Jan 22, 2021 at 11:02 AM Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > >> Or are you going to maintain some internal state such that, the > > >> `abortTransaction` in the catch block would never throw again? > > >> > > >> On Fri, Jan 22, 2021 at 11:01 AM Guozhang Wang <wangg...@gmail.com> > > >> wrote: > > >> > > >> > Hi Boyang/Jason, > > >> > > > >> > I've also thought about this (i.e. using CommitFailed for all > > >> non-fatal), > > >> > but what I'm pondering is that, in the catch (CommitFailed) block, > > what > > >> > would happen if the `producer.abortTransaction();` throws again? > > should > > >> > that be captured as a fatal and cause the client to close again. > > >> > > > >> > If yes, then naively the pattern would be: > > >> > > > >> > ... > > >> > catch (CommitFailedException e) { > > >> > // Transaction commit failed with abortable error, user could > > >> reset > > >> > // the application state and resume with a new transaction. > > The > > >> > root > > >> > // cause was wrapped in the thrown exception. > > >> > resetToLastCommittedPositions(consumer); > > >> > try { > > >> > producer.abortTransaction(); > > >> > } catch (KafkaException e) { > > >> > producer.close(); > > >> > consumer.close(); > > >> > throw e; > > >> > } > > >> > } catch (KafkaException e) { > > >> > producer.close(); > > >> > consumer.close(); > > >> > throw e; > > >> > } > > >> > ... > > >> > > > >> > Guozhang > > >> > > > >> > On Fri, Jan 22, 2021 at 10:47 AM Boyang Chen < > > >> reluctanthero...@gmail.com> > > >> > wrote: > > >> > > > >> >> Hey Guozhang, > > >> >> > > >> >> Jason and I were discussing the new API offline and decided to take > > >> >> another > > >> >> approach. Firstly, the reason not to invent a new API with returned > > >> >> boolean > > >> >> flag is for compatibility consideration, since old EOS code would not > > >> know > > >> >> that a given transaction commit was failed internally as they don't > > >> listen > > >> >> to the output flag. Our proposed alternative solution is to let > > >> >> *commitTransaction > > >> >> throw CommitFailedException whenever the commit failed with non-fatal > > >> >> exception*, so that on the caller side the handling logic becomes: > > >> >> > > >> >> try { > > >> >> if (shouldCommit) { > > >> >> producer.commitTransaction(); > > >> >> } else { > > >> >> resetToLastCommittedPositions(consumer); > > >> >> producer.abortTransaction(); > > >> >> } > > >> >> } catch (CommitFailedException e) { > > >> >> // Transaction commit failed with abortable error, user could > > >> >> reset > > >> >> // the application state and resume with a new transaction. > > The > > >> >> root > > >> >> // cause was wrapped in the thrown exception. > > >> >> resetToLastCommittedPositions(consumer); > > >> >> producer.abortTransaction(); > > >> >> } catch (KafkaException e) { > > >> >> producer.close(); > > >> >> consumer.close(); > > >> >> throw e; > > >> >> } > > >> >> > > >> >> This approach looks cleaner as all exception types other than > > >> CommitFailed > > >> >> will doom to be fatal, which is very easy to adopt for users. In the > > >> >> meantime, we still maintain the commitTxn behavior to throw instead > > of > > >> >> silently failing. > > >> >> > > >> >> In addition, we decided to drop the recommendation to handle > > >> >> TimeoutException and leave it to the users to make the call. The > > >> downside > > >> >> for blindly calling abortTxn upon timeout is that we could result in > > an > > >> >> illegal state when the commit was already successful on the broker > > >> >> side. Without a good guarantee on the outcome, overcomplicating the > > >> >> template should not be encouraged IMHO. > > >> >> > > >> >> Let me know your thoughts on the new approach here, thank you! > > >> >> > > >> >> Best, > > >> >> Boyang > > >> >> > > >> >> On Tue, Jan 19, 2021 at 11:11 AM Guozhang Wang <wangg...@gmail.com> > > >> >> wrote: > > >> >> > > >> >> > Thanks for your clarification on 2)/3), that makes sense. > > >> >> > > > >> >> > On Tue, Jan 19, 2021 at 10:16 AM Boyang Chen < > > >> >> reluctanthero...@gmail.com> > > >> >> > wrote: > > >> >> > > > >> >> > > Thanks for the input Guozhang, replied inline. > > >> >> > > > > >> >> > > On Mon, Jan 18, 2021 at 8:57 PM Guozhang Wang < > > wangg...@gmail.com> > > >> >> > wrote: > > >> >> > > > > >> >> > > > Hello Boyang, > > >> >> > > > > > >> >> > > > Thanks for the updated KIP. I read it again and have the > > >> following > > >> >> > > > thoughts: > > >> >> > > > > > >> >> > > > 0. I'm a bit concerned that if commitTxn does not throw any > > >> >> non-fatal > > >> >> > > > exception, and instead we rely on the subsequent beginTxn call > > to > > >> >> > throw, > > >> >> > > it > > >> >> > > > may violate some callers with a pattern that relying on > > >> commitTxn to > > >> >> > > > succeed to make some non-rollback operations. For example: > > >> >> > > > > > >> >> > > > beginTxn() > > >> >> > > > // do some read-write on my local DB > > >> >> > > > commitTxn() > > >> >> > > > // if commitTxn succeeds, then commit the DB > > >> >> > > > > > >> >> > > > ------------- > > >> >> > > > > > >> >> > > > The issue is that, committing DB is a non-rollback operation, > > and > > >> >> users > > >> >> > > may > > >> >> > > > just rely on commitTxn to return without error to make this > > >> >> > non-rollback > > >> >> > > > call. Of course we can just claim this pattern is not > > legitimate > > >> >> and is > > >> >> > > not > > >> >> > > > the right way of doing things, but many users may naturally > > adopt > > >> >> this > > >> >> > > > pattern. > > >> >> > > > > > >> >> > > > So maybe we should still let commitTxn also throw non-fatal > > >> >> exceptions, > > >> >> > > in > > >> >> > > > which case we would then call abortTxn again. > > >> >> > > > > > >> >> > > > But if we do this, the pattern becomes: > > >> >> > > > > > >> >> > > > try { > > >> >> > > > beginTxn() > > >> >> > > > // do something > > >> >> > > > } catch (Exception) { > > >> >> > > > shouldCommit = false; > > >> >> > > > } > > >> >> > > > > > >> >> > > > if (shouldCommit) { > > >> >> > > > try { > > >> >> > > > commitTxn() > > >> >> > > > } catch (...) { // enumerate all fatal exceptions > > >> >> > > > shutdown() > > >> >> > > > } catch (KafkaException) { > > >> >> > > > // non-fatal > > >> >> > > > shouldCommit = false; > > >> >> > > > } > > >> >> > > > } > > >> >> > > > > > >> >> > > > if (!shouldCommit && running()) { > > >> >> > > > try { > > >> >> > > > abortTxn() > > >> >> > > > } catch (KafkaException) { > > >> >> > > > // only throw fatal > > >> >> > > > shutdown() > > >> >> > > > } > > >> >> > > > } > > >> >> > > > > > >> >> > > > --------------------- > > >> >> > > > > > >> >> > > > Which is much more complicated. > > >> >> > > > > > >> >> > > > Thank makes me think, the alternative we have discussed offline > > >> may > > >> >> be > > >> >> > > > better: let commitTxn() to return a boolean flag. > > >> >> > > > > > >> >> > > > * If it returns true, it means the commit succeeded. Users can > > >> >> > > comfortably > > >> >> > > > continue and do any external non-rollback operations if they > > >> like. > > >> >> > > > * If it returns false, it means the commit failed with > > non-fatal > > >> >> error > > >> >> > > *AND > > >> >> > > > the txn has been aborted*. Users do not need to call abortTxn > > >> again. > > >> >> > > > * If it throws, then it means fatal errors. Users should shut > > >> down > > >> >> the > > >> >> > > > client. > > >> >> > > > > > >> >> > > > In this case, the pattern becomes: > > >> >> > > > > > >> >> > > > try { > > >> >> > > > beginTxn() > > >> >> > > > // do something > > >> >> > > > } catch (Exception) { > > >> >> > > > shouldCommit = false; > > >> >> > > > } > > >> >> > > > > > >> >> > > > try { > > >> >> > > > if (shouldCommit) { > > >> >> > > > commitSucceeded = commitTxn() > > >> >> > > > } else { > > >> >> > > > // reset offsets, rollback operations, etc > > >> >> > > > abortTxn() > > >> >> > > > } > > >> >> > > > } catch (KafkaException) { > > >> >> > > > // only throw fatal > > >> >> > > > shutdown() > > >> >> > > > } > > >> >> > > > > > >> >> > > > if (commitSucceeded) > > >> >> > > > // do other non-rollbackable things > > >> >> > > > else > > >> >> > > > // reset offsets, rollback operations, etc > > >> >> > > > > > >> >> > > > ------------------------- > > >> >> > > > > > >> >> > > > Of course, if we want to have better visibility into what > > caused > > >> the > > >> >> > > commit > > >> >> > > > to fail and txn to abort. We can let the return type be an enum > > >> >> instead > > >> >> > > of > > >> >> > > > a flag. But my main idea is to still let the commitTxn be the > > >> final > > >> >> > point > > >> >> > > > users can tell whether this txn succeeded or not, instead of > > >> >> relying on > > >> >> > > the > > >> >> > > > next beginTxn() call. > > >> >> > > > > > >> >> > > > I agree that adding a boolean flag is indeed useful in this > > case. > > >> >> Will > > >> >> > > update the KIP. > > >> >> > > > > >> >> > > 1. Re: "while maintaining the behavior to throw fatal exception > > in > > >> >> raw" > > >> >> > not > > >> >> > > > sure what do you mean by "throw" here. Are you proposing the > > >> >> callback > > >> >> > > would > > >> >> > > > *pass* (not throw) in any fatal exceptions when triggered > > without > > >> >> > > wrapping? > > >> >> > > > > > >> >> > > > Yes, I want to say *pass*, the benefit is to make the end > > user's > > >> >> > > expectation consistent > > >> >> > > regarding exception handling. > > >> >> > > > > >> >> > > > > >> >> > > > 2. I'm not sure I understand the claim regarding the callback > > >> that > > >> >> "In > > >> >> > > EOS > > >> >> > > > setup, it is not required to handle the exception". Are you > > >> >> proposing > > >> >> > > that, > > >> >> > > > e.g. in Streams, we do not try to handle any exceptions if EOS > > is > > >> >> > enabled > > >> >> > > > in the callback anymore, but just let commitTxn() itself to > > fail > > >> to > > >> >> be > > >> >> > > > notified about the problem? Personally I think in Streams we > > >> should > > >> >> > just > > >> >> > > > make the handling logic of the callback to be consistent > > >> regardless > > >> >> of > > >> >> > > the > > >> >> > > > EOS settings (today we have different logic depending on this > > >> logic, > > >> >> > > which > > >> >> > > > I think could be unified as well). > > >> >> > > > > > >> >> > > > My idea originates from the claim on send API: > > >> >> > > "When used as part of a transaction, it is not necessary to > > define > > >> a > > >> >> > > callback or check the result of the future in order to detect > > >> errors > > >> >> > from > > >> >> > > <code>send</code>. " > > >> >> > > My understanding is that for EOS, the exception will be detected > > by > > >> >> > calling > > >> >> > > transactional APIs either way, so it is a duplicate handling to > > >> track > > >> >> > > all the sendExceptions in RecordCollector. However, I looked up > > >> >> > > sendException is being used today as follow: > > >> >> > > > > >> >> > > 1. Pass to "ProductionExceptionHandler" for any default or > > >> customized > > >> >> > > exception handler to handle > > >> >> > > 2. Stop collecting offset info or new exceptions > > >> >> > > 3. Check and rethrow exceptions in close(), flush() or new send() > > >> >> calls > > >> >> > > > > >> >> > > To my understanding, we should still honor the commitment to #1 > > for > > >> >> any > > >> >> > > user customized implementation. The #2 does not really affect our > > >> >> > decision > > >> >> > > upon EOS. The #3 here is still valuable as it could help us fail > > >> fast > > >> >> in > > >> >> > > new send() instead of waiting to later stage of processing. In > > that > > >> >> > sense, > > >> >> > > I agree we should continue to record send exceptions even under > > EOS > > >> >> case > > >> >> > to > > >> >> > > ensure the strength of stream side Producer logic. On the safer > > >> side, > > >> >> we > > >> >> > no > > >> >> > > longer need to wrap certain fatal exceptions like ProducerFenced > > as > > >> >> > > TaskMigrated, since they should not crash the stream thread if > > >> thrown > > >> >> in > > >> >> > > raw format, once we adopt the new processing model in the send > > >> phase. > > >> >> > > > > >> >> > > > > >> >> > > > > > >> >> > > > Guozhang > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > On Thu, Dec 17, 2020 at 8:42 PM Boyang Chen < > > >> >> > reluctanthero...@gmail.com> > > >> >> > > > wrote: > > >> >> > > > > > >> >> > > > > Thanks for everyone's feedback so far. I have polished the > > KIP > > >> >> after > > >> >> > > > > offline discussion with some folks working on EOS to make the > > >> >> > exception > > >> >> > > > > handling more lightweight. The essential change is that we > > are > > >> not > > >> >> > > > > inventing a new intermediate exception type, but instead > > >> >> separating a > > >> >> > > > full > > >> >> > > > > transaction into two phases: > > >> >> > > > > > > >> >> > > > > 1. The data transmission phase > > >> >> > > > > 2. The commit phase > > >> >> > > > > > > >> >> > > > > This way for any exception thrown from phase 1, will be an > > >> >> indicator > > >> >> > in > > >> >> > > > > phase 2 whether we should do commit or abort, and from now on > > >> >> > > > > `commitTransaction` should only throw fatal exceptions, > > >> similar to > > >> >> > > > > `abortTransaction`, so that any KafkaException caught in the > > >> >> commit > > >> >> > > phase > > >> >> > > > > will be definitely fatal to crash the app. For more advanced > > >> users > > >> >> > such > > >> >> > > > as > > >> >> > > > > Streams, we have the ability to further wrap selected types > > of > > >> >> fatal > > >> >> > > > > exceptions to trigger task migration if necessary. > > >> >> > > > > > > >> >> > > > > More details in the KIP, feel free to take another look, > > >> thanks! > > >> >> > > > > > > >> >> > > > > On Thu, Dec 17, 2020 at 8:36 PM Boyang Chen < > > >> >> > > reluctanthero...@gmail.com> > > >> >> > > > > wrote: > > >> >> > > > > > > >> >> > > > > > Thanks Bruno for the feedback. > > >> >> > > > > > > > >> >> > > > > > On Mon, Dec 7, 2020 at 5:26 AM Bruno Cadonna < > > >> >> br...@confluent.io> > > >> >> > > > wrote: > > >> >> > > > > > > > >> >> > > > > >> Thanks Boyang for the KIP! > > >> >> > > > > >> > > >> >> > > > > >> Like Matthias, I do also not know the producer internal > > well > > >> >> > enough > > >> >> > > to > > >> >> > > > > >> comment on the categorization. However, I think having a > > >> super > > >> >> > > > exception > > >> >> > > > > >> (e.g. RetriableException) that encodes if an exception is > > >> >> fatal > > >> >> > or > > >> >> > > > not > > >> >> > > > > >> is cleaner, better understandable and less error-prone, > > >> because > > >> >> > > > ideally > > >> >> > > > > >> when you add a new non-fatal exception in future you just > > >> need > > >> >> to > > >> >> > > > think > > >> >> > > > > >> about letting it inherit from the super exception and all > > >> the > > >> >> rest > > >> >> > > of > > >> >> > > > > >> the code will just behave correctly without the need to > > wrap > > >> >> the > > >> >> > new > > >> >> > > > > >> exception into another exception each time it is thrown > > >> (maybe > > >> >> it > > >> >> > is > > >> >> > > > > >> thrown at different location in the code). > > >> >> > > > > >> > > >> >> > > > > >> As far as I understand the following statement from your > > >> >> previous > > >> >> > > > e-mail > > >> >> > > > > >> is the reason that currently such a super exception is not > > >> >> > possible: > > >> >> > > > > >> > > >> >> > > > > >> "Right now we have RetriableException type, if we are > > going > > >> to > > >> >> > add a > > >> >> > > > > >> `ProducerRetriableException` type, we have to put this new > > >> >> > interface > > >> >> > > > as > > >> >> > > > > >> the parent of the RetriableException, because not all > > thrown > > >> >> > > non-fatal > > >> >> > > > > >> exceptions are `retriable` in general for producer" > > >> >> > > > > >> > > >> >> > > > > >> > > >> >> > > > > >> In the list of exceptions in your KIP, I found non-fatal > > >> >> > exceptions > > >> >> > > > that > > >> >> > > > > >> do not inherit from RetriableException. I guess those are > > >> the > > >> >> ones > > >> >> > > you > > >> >> > > > > >> are referring to in your statement: > > >> >> > > > > >> > > >> >> > > > > >> InvalidProducerEpochException > > >> >> > > > > >> InvalidPidMappingException > > >> >> > > > > >> TransactionAbortedException > > >> >> > > > > >> > > >> >> > > > > >> All of those exceptions are non-fatal and do not inherit > > >> from > > >> >> > > > > >> RetriableException. Is there a reason for that? If they > > >> >> depended > > >> >> > > from > > >> >> > > > > >> RetriableException we would be a bit closer to a super > > >> >> exception I > > >> >> > > > > >> mention above. > > >> >> > > > > >> > > >> >> > > > > >> The reason is that sender may catch those exceptions in > > the > > >> >> > > > > > ProduceResponse, and it currently does infinite > > >> >> > > > > > retries on RetriableException. To make sure we could > > trigger > > >> the > > >> >> > > > > > abortTransaction() by catching non-fatal thrown > > >> >> > > > > > exceptions and reinitialize the txn state, we chose not to > > >> let > > >> >> > those > > >> >> > > > > > exceptions inherit RetriableException so that > > >> >> > > > > > they won't cause infinite retry on sender. > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > >> With OutOfOrderSequenceException and > > >> >> UnknownProducerIdException, I > > >> >> > > > think > > >> >> > > > > >> to understand that their fatality depends on the type > > (i.e. > > >> >> > > > > >> configuration) of the producer. That makes it difficult to > > >> >> have a > > >> >> > > > super > > >> >> > > > > >> exception that encodes the retriability as mentioned > > above. > > >> >> Would > > >> >> > it > > >> >> > > > be > > >> >> > > > > >> possible to introduce exceptions that inherit from > > >> >> > > RetriableException > > >> >> > > > > >> and that are thrown when those exceptions are caught from > > >> the > > >> >> > > brokers > > >> >> > > > > >> and the type of the producer is such that the exceptions > > are > > >> >> > > > retriable? > > >> >> > > > > >> > > >> >> > > > > >> Yea, I think in general the exception type mixing is > > >> difficult > > >> >> to > > >> >> > > get > > >> >> > > > > > them all right. I have already proposed another solution > > >> based > > >> >> on > > >> >> > my > > >> >> > > > > > offline discussion with some folks working on EOS > > >> >> > > > > > to make the handling more straightforward for end users > > >> without > > >> >> the > > >> >> > > > need > > >> >> > > > > > to distinguish exception fatality. > > >> >> > > > > > > > >> >> > > > > >> Best, > > >> >> > > > > >> Bruno > > >> >> > > > > >> > > >> >> > > > > >> > > >> >> > > > > >> On 04.12.20 19:34, Guozhang Wang wrote: > > >> >> > > > > >> > Thanks Boyang for the proposal! I made a pass over the > > >> list > > >> >> and > > >> >> > > here > > >> >> > > > > are > > >> >> > > > > >> > some thoughts: > > >> >> > > > > >> > > > >> >> > > > > >> > 0) Although this is not part of the public API, I think > > we > > >> >> > should > > >> >> > > > make > > >> >> > > > > >> sure > > >> >> > > > > >> > that the suggested pattern (i.e. user can always call > > >> >> abortTxn() > > >> >> > > > when > > >> >> > > > > >> > handling non-fatal errors) are indeed supported. E.g. if > > >> the > > >> >> txn > > >> >> > > is > > >> >> > > > > >> already > > >> >> > > > > >> > aborted by the broker side, then users can still call > > >> >> abortTxn > > >> >> > > which > > >> >> > > > > >> would > > >> >> > > > > >> > not throw another exception but just be treated as a > > >> no-op. > > >> >> > > > > >> > > > >> >> > > > > >> > 1) *ConcurrentTransactionsException*: I think this error > > >> can > > >> >> > also > > >> >> > > be > > >> >> > > > > >> > returned but not documented yet. This should be a > > >> non-fatal > > >> >> > error. > > >> >> > > > > >> > > > >> >> > > > > >> > 2) *InvalidTxnStateException*: this error is returned > > from > > >> >> > broker > > >> >> > > > when > > >> >> > > > > >> txn > > >> >> > > > > >> > state transition failed (e.g. it is trying to transit to > > >> >> > > > > complete-commit > > >> >> > > > > >> > while the current state is not prepare-commit). This > > error > > >> >> could > > >> >> > > > > >> indicates > > >> >> > > > > >> > a bug on the client internal code or the broker code, > > OR a > > >> >> user > > >> >> > > > error > > >> >> > > > > >> --- a > > >> >> > > > > >> > similar error is ConcurrentTransactionsException, i.e. > > if > > >> >> Kafka > > >> >> > is > > >> >> > > > > >> bug-free > > >> >> > > > > >> > these exceptions would only be returned if users try to > > do > > >> >> > > something > > >> >> > > > > >> wrong, > > >> >> > > > > >> > e.g. calling abortTxn right after a commitTxn, etc. So > > I'm > > >> >> > > thinking > > >> >> > > > it > > >> >> > > > > >> > should be a non-fatal error instead of a fatal error, > > >> wdyt? > > >> >> > > > > >> > > > >> >> > > > > >> > 3) *KafkaException*: case i "indicates fatal > > transactional > > >> >> > > sequence > > >> >> > > > > >> > (Fatal)", this is a bit conflicting with the > > >> >> > > > *OutOfSequenceException* > > >> >> > > > > >> that > > >> >> > > > > >> > is treated as non-fatal. I guess your proposal is that > > >> >> > > > > >> > OutOfOrderSequenceException would be treated either as > > >> fatal > > >> >> > with > > >> >> > > > > >> > transactional producer, or non-fatal with idempotent > > >> >> producer, > > >> >> > is > > >> >> > > > that > > >> >> > > > > >> > right? If the producer is only configured with > > idempotency > > >> >> but > > >> >> > not > > >> >> > > > > >> > transaction, then throwing a > > >> >> TransactionStateCorruptedException > > >> >> > > for > > >> >> > > > > >> > non-fatal errors would be confusing since users are not > > >> using > > >> >> > > > > >> transactions > > >> >> > > > > >> > at all.. So I suggest we always throw > > >> OutOfSequenceException > > >> >> > as-is > > >> >> > > > > (i.e. > > >> >> > > > > >> > not wrapped) no matter how the producer is configured, > > and > > >> >> let > > >> >> > the > > >> >> > > > > >> caller > > >> >> > > > > >> > decide how to handle it based on whether it is only > > >> >> idempotent > > >> >> > or > > >> >> > > > > >> > transactional itself. > > >> >> > > > > >> > > > >> >> > > > > >> > 4) Besides all the txn APIs, the `send()` callback / > > >> future > > >> >> can > > >> >> > > also > > >> >> > > > > >> throw > > >> >> > > > > >> > txn-related exceptions, I think this KIP should also > > cover > > >> >> this > > >> >> > > API > > >> >> > > > as > > >> >> > > > > >> well? > > >> >> > > > > >> > > > >> >> > > > > >> > 5) This is related to 1/2) above: sometimes those > > >> non-fatal > > >> >> > errors > > >> >> > > > > like > > >> >> > > > > >> > ConcurrentTxn or InvalidTxnState are not due to the > > state > > >> >> being > > >> >> > > > > >> corrupted > > >> >> > > > > >> > at the broker side, but maybe users are doing something > > >> >> wrong. > > >> >> > So > > >> >> > > > I'm > > >> >> > > > > >> > wondering if we should further distinguish those > > non-fatal > > >> >> > errors > > >> >> > > > > >> between > > >> >> > > > > >> > a) those that are caused by Kafka itself, e.g. a broker > > >> timed > > >> >> > out > > >> >> > > > and > > >> >> > > > > >> > aborted a txn and later an endTxn request is received, > > >> and b) > > >> >> > the > > >> >> > > > > user's > > >> >> > > > > >> > API call pattern is incorrect, causing the request to be > > >> >> > rejected > > >> >> > > > with > > >> >> > > > > >> an > > >> >> > > > > >> > error code from the broker. > > >> >> *TransactionStateCorruptedException* > > >> >> > > > feels > > >> >> > > > > >> to > > >> >> > > > > >> > me more like for case a), but not case b). > > >> >> > > > > >> > > > >> >> > > > > >> > > > >> >> > > > > >> > Guozhang > > >> >> > > > > >> > > > >> >> > > > > >> > > > >> >> > > > > >> > On Wed, Dec 2, 2020 at 4:50 PM Boyang Chen < > > >> >> > > > > reluctanthero...@gmail.com> > > >> >> > > > > >> > wrote: > > >> >> > > > > >> > > > >> >> > > > > >> >> Thanks Matthias, I think your proposal makes sense as > > >> well, > > >> >> on > > >> >> > > the > > >> >> > > > > pro > > >> >> > > > > >> side > > >> >> > > > > >> >> we could have a universally agreed exception type to be > > >> >> caught > > >> >> > by > > >> >> > > > the > > >> >> > > > > >> user, > > >> >> > > > > >> >> without having an extra layer on top of the actual > > >> >> exceptions. > > >> >> > I > > >> >> > > > > could > > >> >> > > > > >> see > > >> >> > > > > >> >> some issue on downsides: > > >> >> > > > > >> >> > > >> >> > > > > >> >> 1. The exception hierarchy will be more complex. Right > > >> now > > >> >> we > > >> >> > > have > > >> >> > > > > >> >> RetriableException type, if we are going to add a > > >> >> > > > > >> >> `ProducerRetriableException` type, we have to put this > > >> new > > >> >> > > > interface > > >> >> > > > > >> as the > > >> >> > > > > >> >> parent of the RetriableException, because not all > > thrown > > >> >> > > non-fatal > > >> >> > > > > >> >> exceptions are `retriable` in general for producer, for > > >> >> example > > >> >> > > > > >> >> < > > >> >> > > > > >> >> > > >> >> > > > > >> > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > https://github.com/apache/kafka/blob/e275742f850af4a1b79b0d1bd1ac9a1d2e89c64e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L745 > > >> >> > > > > >> >>> . > > >> >> > > > > >> >> We could have an empty interface > > >> >> `ProducerRetriableException` > > >> >> > to > > >> >> > > > let > > >> >> > > > > >> all > > >> >> > > > > >> >> the thrown exceptions implement for sure, but it's a > > bit > > >> >> > > unorthodox > > >> >> > > > > in > > >> >> > > > > >> the > > >> >> > > > > >> >> way we deal with exceptions in general. > > >> >> > > > > >> >> > > >> >> > > > > >> >> 2. There are cases where we throw a KafkaException > > >> wrapping > > >> >> > > another > > >> >> > > > > >> >> KafkaException as either fatal or non-fatal. If we use > > an > > >> >> > > interface > > >> >> > > > > to > > >> >> > > > > >> >> solve #1, it is also required to implement another > > >> bloated > > >> >> > > > exception > > >> >> > > > > >> class > > >> >> > > > > >> >> which could replace KafkaException type, as we couldn't > > >> mark > > >> >> > > > > >> KafkaException > > >> >> > > > > >> >> as retriable for sure. > > >> >> > > > > >> >> > > >> >> > > > > >> >> 3. In terms of the encapsulation, wrapping means we > > could > > >> >> limit > > >> >> > > the > > >> >> > > > > >> scope > > >> >> > > > > >> >> of affection to the producer only, which is important > > >> since > > >> >> we > > >> >> > > > don't > > >> >> > > > > >> want > > >> >> > > > > >> >> shared exception types to implement a producer-related > > >> >> > interface, > > >> >> > > > > such > > >> >> > > > > >> >> as UnknownTopicOrPartitionException. > > >> >> > > > > >> >> > > >> >> > > > > >> >> Best, > > >> >> > > > > >> >> Boyang > > >> >> > > > > >> >> > > >> >> > > > > >> >> On Wed, Dec 2, 2020 at 3:38 PM Matthias J. Sax < > > >> >> > mj...@apache.org > > >> >> > > > > > >> >> > > > > >> wrote: > > >> >> > > > > >> >> > > >> >> > > > > >> >>> Thanks for the KIP Boyang! > > >> >> > > > > >> >>> > > >> >> > > > > >> >>> Overall, categorizing exceptions makes a lot of sense. > > >> As I > > >> >> > > don't > > >> >> > > > > know > > >> >> > > > > >> >>> the producer internals well enough, I cannot comment > > on > > >> the > > >> >> > > > > >> >>> categorization in detail though. > > >> >> > > > > >> >>> > > >> >> > > > > >> >>> What I am wondering is, if we should introduce an > > >> exception > > >> >> > > > > interface > > >> >> > > > > >> >>> that non-fatal exception would implement instead of > > >> >> creating a > > >> >> > > new > > >> >> > > > > >> class > > >> >> > > > > >> >>> that will wrap non-fatal exceptions? What would be the > > >> >> > pros/cons > > >> >> > > > for > > >> >> > > > > >> >>> both designs? > > >> >> > > > > >> >>> > > >> >> > > > > >> >>> > > >> >> > > > > >> >>> -Matthias > > >> >> > > > > >> >>> > > >> >> > > > > >> >>> > > >> >> > > > > >> >>> On 12/2/20 11:35 AM, Boyang Chen wrote: > > >> >> > > > > >> >>>> Hey there, > > >> >> > > > > >> >>>> > > >> >> > > > > >> >>>> I would like to start a discussion thread for > > KIP-691: > > >> >> > > > > >> >>>> > > >> >> > > > > >> >>> > > >> >> > > > > >> >> > > >> >> > > > > >> > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-691%3A+Enhance+Transactional+Producer+Exception+Handling > > >> >> > > > > >> >>>> > > >> >> > > > > >> >>>> The KIP is aiming to simplify the exception handling > > >> logic > > >> >> > for > > >> >> > > > > >> >>>> transactional Producer users by classifying fatal and > > >> >> > non-fatal > > >> >> > > > > >> >>> exceptions > > >> >> > > > > >> >>>> and throw them correspondingly for easier catch and > > >> retry. > > >> >> > Let > > >> >> > > me > > >> >> > > > > >> know > > >> >> > > > > >> >>> what > > >> >> > > > > >> >>>> you think. > > >> >> > > > > >> >>>> > > >> >> > > > > >> >>>> Best, > > >> >> > > > > >> >>>> Boyang > > >> >> > > > > >> >>>> > > >> >> > > > > >> >>> > > >> >> > > > > >> >> > > >> >> > > > > >> > > > >> >> > > > > >> > > > >> >> > > > > >> > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > -- > > >> >> > > > -- Guozhang > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > -- > > >> >> > -- Guozhang > > >> >> > > > >> >> > > >> > > > >> > > > >> > -- > > >> > -- Guozhang > > >> > > > >> > > >> > > >> -- > > >> -- Guozhang > > >> > > > > > > > > -- > -- Guozhang