Re: [DISCUSS] KIP-691: Transactional Producer Exception Handling

Boyang Chen Fri, 22 Jan 2021 13:13:01 -0800

Thanks for the clarification Guozhang, I got your point that we want to
have a consistent handling of fatal exceptions being thrown from the
abortTxn. I modified the current template to move the fatal exception
try-catch outside of the processing loop to make sure we could get a chance
to close consumer/producer modules. Let me know what you think.


Best,
Boyang

On Fri, Jan 22, 2021 at 11:05 AM Boyang Chen <reluctanthero...@gmail.com>
wrote:

> My understanding is that abortTransaction would only throw when the
> producer is in fatal state. For CommitFailed, the producer should still be
> in the abortable error state, so that abortTransaction call would not throw.
>
> On Fri, Jan 22, 2021 at 11:02 AM Guozhang Wang <wangg...@gmail.com> wrote:
>
>> Or are you going to maintain some internal state such that, the
>> `abortTransaction` in the catch block would never throw again?
>>
>> On Fri, Jan 22, 2021 at 11:01 AM Guozhang Wang <wangg...@gmail.com>
>> wrote:
>>
>> > Hi Boyang/Jason,
>> >
>> > I've also thought about this (i.e. using CommitFailed for all
>> non-fatal),
>> > but what I'm pondering is that, in the catch (CommitFailed) block, what
>> > would happen if the `producer.abortTransaction();` throws again? should
>> > that be captured as a fatal and cause the client to close again.
>> >
>> > If yes, then naively the pattern would be:
>> >
>> > ...
>> > catch (CommitFailedException e) {
>> >         // Transaction commit failed with abortable error, user could
>> reset
>> >         // the application state and resume with a new transaction. The
>> > root
>> >         // cause was wrapped in the thrown exception.
>> >         resetToLastCommittedPositions(consumer);
>> >         try {
>> >             producer.abortTransaction();
>> >         } catch (KafkaException e) {
>> >             producer.close();
>> >             consumer.close();
>> >             throw e;
>> >         }
>> >     } catch (KafkaException e) {
>> >         producer.close();
>> >         consumer.close();
>> >         throw e;
>> >     }
>> > ...
>> >
>> > Guozhang
>> >
>> > On Fri, Jan 22, 2021 at 10:47 AM Boyang Chen <
>> reluctanthero...@gmail.com>
>> > wrote:
>> >
>> >> Hey Guozhang,
>> >>
>> >> Jason and I were discussing the new API offline and decided to take
>> >> another
>> >> approach. Firstly, the reason not to invent a new API with returned
>> >> boolean
>> >> flag is for compatibility consideration, since old EOS code would not
>> know
>> >> that a given transaction commit was failed internally as they don't
>> listen
>> >> to the output flag. Our proposed alternative solution is to let
>> >> *commitTransaction
>> >> throw CommitFailedException whenever the commit failed with non-fatal
>> >> exception*, so that on the caller side the handling logic becomes:
>> >>
>> >> try {
>> >>         if (shouldCommit) {
>> >>             producer.commitTransaction();
>> >>         } else {
>> >>             resetToLastCommittedPositions(consumer);
>> >>             producer.abortTransaction();
>> >>         }
>> >>     } catch (CommitFailedException e) {
>> >>         // Transaction commit failed with abortable error, user could
>> >> reset
>> >>         // the application state and resume with a new transaction. The
>> >> root
>> >>         // cause was wrapped in the thrown exception.
>> >>         resetToLastCommittedPositions(consumer);
>> >>         producer.abortTransaction();
>> >>     } catch (KafkaException e) {
>> >>         producer.close();
>> >>         consumer.close();
>> >>         throw e;
>> >>     }
>> >>
>> >> This approach looks cleaner as all exception types other than
>> CommitFailed
>> >> will doom to be fatal, which is very easy to adopt for users. In the
>> >> meantime, we still maintain the commitTxn behavior to throw instead of
>> >> silently failing.
>> >>
>> >> In addition, we decided to drop the recommendation to handle
>> >> TimeoutException and leave it to the users to make the call. The
>> downside
>> >> for blindly calling abortTxn upon timeout is that we could result in an
>> >> illegal state when the commit was already successful on the broker
>> >> side. Without a good guarantee on the outcome, overcomplicating the
>> >> template should not be encouraged IMHO.
>> >>
>> >> Let me know your thoughts on the new approach here, thank you!
>> >>
>> >> Best,
>> >> Boyang
>> >>
>> >> On Tue, Jan 19, 2021 at 11:11 AM Guozhang Wang <wangg...@gmail.com>
>> >> wrote:
>> >>
>> >> > Thanks for your clarification on 2)/3), that makes sense.
>> >> >
>> >> > On Tue, Jan 19, 2021 at 10:16 AM Boyang Chen <
>> >> reluctanthero...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Thanks for the input Guozhang, replied inline.
>> >> > >
>> >> > > On Mon, Jan 18, 2021 at 8:57 PM Guozhang Wang <wangg...@gmail.com>
>> >> > wrote:
>> >> > >
>> >> > > > Hello Boyang,
>> >> > > >
>> >> > > > Thanks for the updated KIP. I read it again and have the
>> following
>> >> > > > thoughts:
>> >> > > >
>> >> > > > 0. I'm a bit concerned that if commitTxn does not throw any
>> >> non-fatal
>> >> > > > exception, and instead we rely on the subsequent beginTxn call to
>> >> > throw,
>> >> > > it
>> >> > > > may violate some callers with a pattern that relying on
>> commitTxn to
>> >> > > > succeed to make some non-rollback operations. For example:
>> >> > > >
>> >> > > > beginTxn()
>> >> > > > // do some read-write on my local DB
>> >> > > > commitTxn()
>> >> > > > // if commitTxn succeeds, then commit the DB
>> >> > > >
>> >> > > > -------------
>> >> > > >
>> >> > > > The issue is that, committing DB is a non-rollback operation, and
>> >> users
>> >> > > may
>> >> > > > just rely on commitTxn to return without error to make this
>> >> > non-rollback
>> >> > > > call. Of course we can just claim this pattern is not legitimate
>> >> and is
>> >> > > not
>> >> > > > the right way of doing things, but many users may naturally adopt
>> >> this
>> >> > > > pattern.
>> >> > > >
>> >> > > > So maybe we should still let commitTxn also throw non-fatal
>> >> exceptions,
>> >> > > in
>> >> > > > which case we would then call abortTxn again.
>> >> > > >
>> >> > > > But if we do this, the pattern becomes:
>> >> > > >
>> >> > > > try {
>> >> > > >    beginTxn()
>> >> > > >    // do something
>> >> > > > } catch (Exception) {
>> >> > > >    shouldCommit = false;
>> >> > > > }
>> >> > > >
>> >> > > > if (shouldCommit) {
>> >> > > >     try {
>> >> > > >         commitTxn()
>> >> > > >     } catch (...) {        // enumerate all fatal exceptions
>> >> > > >         shutdown()
>> >> > > >     } catch (KafkaException) {
>> >> > > >         // non-fatal
>> >> > > >         shouldCommit = false;
>> >> > > >     }
>> >> > > > }
>> >> > > >
>> >> > > > if (!shouldCommit && running()) {
>> >> > > > try {
>> >> > > >         abortTxn()
>> >> > > >     } catch (KafkaException) {
>> >> > > >         // only throw fatal
>> >> > > >         shutdown()
>> >> > > >     }
>> >> > > > }
>> >> > > >
>> >> > > > ---------------------
>> >> > > >
>> >> > > > Which is much more complicated.
>> >> > > >
>> >> > > > Thank makes me think, the alternative we have discussed offline
>> may
>> >> be
>> >> > > > better: let commitTxn() to return a boolean flag.
>> >> > > >
>> >> > > > * If it returns true, it means the commit succeeded. Users can
>> >> > > comfortably
>> >> > > > continue and do any external non-rollback operations if they
>> like.
>> >> > > > * If it returns false, it means the commit failed with non-fatal
>> >> error
>> >> > > *AND
>> >> > > > the txn has been aborted*. Users do not need to call abortTxn
>> again.
>> >> > > > * If it throws, then it means fatal errors. Users should shut
>> down
>> >> the
>> >> > > > client.
>> >> > > >
>> >> > > > In this case, the pattern becomes:
>> >> > > >
>> >> > > > try {
>> >> > > >    beginTxn()
>> >> > > >    // do something
>> >> > > > } catch (Exception) {
>> >> > > >    shouldCommit = false;
>> >> > > > }
>> >> > > >
>> >> > > > try {
>> >> > > >     if (shouldCommit) {
>> >> > > >         commitSucceeded = commitTxn()
>> >> > > >     } else {
>> >> > > >         // reset offsets, rollback operations, etc
>> >> > > >         abortTxn()
>> >> > > >     }
>> >> > > > } catch (KafkaException) {
>> >> > > >     // only throw fatal
>> >> > > >     shutdown()
>> >> > > > }
>> >> > > >
>> >> > > > if (commitSucceeded)
>> >> > > >    // do other non-rollbackable things
>> >> > > > else
>> >> > > >    // reset offsets, rollback operations, etc
>> >> > > >
>> >> > > > -------------------------
>> >> > > >
>> >> > > > Of course, if we want to have better visibility into what caused
>> the
>> >> > > commit
>> >> > > > to fail and txn to abort. We can let the return type be an enum
>> >> instead
>> >> > > of
>> >> > > > a flag. But my main idea is to still let the commitTxn be the
>> final
>> >> > point
>> >> > > > users can tell whether this txn succeeded or not, instead of
>> >> relying on
>> >> > > the
>> >> > > > next beginTxn() call.
>> >> > > >
>> >> > > > I agree that adding a boolean flag is indeed useful in this case.
>> >> Will
>> >> > > update the KIP.
>> >> > >
>> >> > > 1. Re: "while maintaining the behavior to throw fatal exception in
>> >> raw"
>> >> > not
>> >> > > > sure what do you mean by "throw" here. Are you proposing the
>> >> callback
>> >> > > would
>> >> > > > *pass* (not throw) in any fatal exceptions when triggered without
>> >> > > wrapping?
>> >> > > >
>> >> > > > Yes, I want to say *pass*, the benefit is to make the end user's
>> >> > > expectation consistent
>> >> > > regarding exception handling.
>> >> > >
>> >> > >
>> >> > > > 2. I'm not sure I understand the claim regarding the callback
>> that
>> >> "In
>> >> > > EOS
>> >> > > > setup, it is not required to handle the exception". Are you
>> >> proposing
>> >> > > that,
>> >> > > > e.g. in Streams, we do not try to handle any exceptions if EOS is
>> >> > enabled
>> >> > > > in the callback anymore, but just let commitTxn() itself to fail
>> to
>> >> be
>> >> > > > notified about the problem? Personally I think in Streams we
>> should
>> >> > just
>> >> > > > make the handling logic of the callback to be consistent
>> regardless
>> >> of
>> >> > > the
>> >> > > > EOS settings (today we have different logic depending on this
>> logic,
>> >> > > which
>> >> > > > I think could be unified as well).
>> >> > > >
>> >> > > > My idea originates from the claim on send API:
>> >> > > "When used as part of a transaction, it is not necessary to define
>> a
>> >> > > callback or check the result of the future  in order to detect
>> errors
>> >> > from
>> >> > > <code>send</code>. "
>> >> > > My understanding is that for EOS, the exception will be detected by
>> >> > calling
>> >> > > transactional APIs either way, so it is a duplicate handling to
>> track
>> >> > > all the sendExceptions in RecordCollector. However, I looked up
>> >> > > sendException is being used today as follow:
>> >> > >
>> >> > > 1. Pass to "ProductionExceptionHandler" for any default or
>> customized
>> >> > > exception handler to handle
>> >> > > 2. Stop collecting offset info or new exceptions
>> >> > > 3. Check and rethrow exceptions in close(), flush() or new send()
>> >> calls
>> >> > >
>> >> > > To my understanding, we should still honor the commitment to #1 for
>> >> any
>> >> > > user customized implementation. The #2 does not really affect our
>> >> > decision
>> >> > > upon EOS. The #3 here is still valuable as it could help us fail
>> fast
>> >> in
>> >> > > new send() instead of waiting to later stage of processing. In that
>> >> > sense,
>> >> > > I agree we should continue to record send exceptions even under EOS
>> >> case
>> >> > to
>> >> > > ensure the strength of stream side Producer logic. On the safer
>> side,
>> >> we
>> >> > no
>> >> > > longer need to wrap certain fatal exceptions like ProducerFenced as
>> >> > > TaskMigrated, since they should not crash the stream thread if
>> thrown
>> >> in
>> >> > > raw format, once we adopt the new processing model in the send
>> phase.
>> >> > >
>> >> > >
>> >> > > >
>> >> > > > Guozhang
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > On Thu, Dec 17, 2020 at 8:42 PM Boyang Chen <
>> >> > reluctanthero...@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Thanks for everyone's feedback so far. I have polished the KIP
>> >> after
>> >> > > > > offline discussion with some folks working on EOS to make the
>> >> > exception
>> >> > > > > handling more lightweight. The essential change is that we are
>> not
>> >> > > > > inventing a new intermediate exception type, but instead
>> >> separating a
>> >> > > > full
>> >> > > > > transaction into two phases:
>> >> > > > >
>> >> > > > > 1. The data transmission phase
>> >> > > > > 2. The commit phase
>> >> > > > >
>> >> > > > > This way for any exception thrown from phase 1, will be an
>> >> indicator
>> >> > in
>> >> > > > > phase 2 whether we should do commit or abort, and from now on
>> >> > > > > `commitTransaction` should only throw fatal exceptions,
>> similar to
>> >> > > > > `abortTransaction`, so that any KafkaException caught in the
>> >> commit
>> >> > > phase
>> >> > > > > will be definitely fatal to crash the app. For more advanced
>> users
>> >> > such
>> >> > > > as
>> >> > > > > Streams, we have the ability to further wrap selected types of
>> >> fatal
>> >> > > > > exceptions to trigger task migration if necessary.
>> >> > > > >
>> >> > > > > More details in the KIP, feel free to take another look,
>> thanks!
>> >> > > > >
>> >> > > > > On Thu, Dec 17, 2020 at 8:36 PM Boyang Chen <
>> >> > > reluctanthero...@gmail.com>
>> >> > > > > wrote:
>> >> > > > >
>> >> > > > > > Thanks Bruno for the feedback.
>> >> > > > > >
>> >> > > > > > On Mon, Dec 7, 2020 at 5:26 AM Bruno Cadonna <
>> >> br...@confluent.io>
>> >> > > > wrote:
>> >> > > > > >
>> >> > > > > >> Thanks Boyang for the KIP!
>> >> > > > > >>
>> >> > > > > >> Like Matthias, I do also not know the producer internal well
>> >> > enough
>> >> > > to
>> >> > > > > >> comment on the categorization. However, I think having a
>> super
>> >> > > > exception
>> >> > > > > >> (e.g. RetriableException) that  encodes if an exception is
>> >> fatal
>> >> > or
>> >> > > > not
>> >> > > > > >> is cleaner, better understandable and less error-prone,
>> because
>> >> > > > ideally
>> >> > > > > >> when you add a new non-fatal exception in future you just
>> need
>> >> to
>> >> > > > think
>> >> > > > > >> about letting it inherit from the super exception and all
>> the
>> >> rest
>> >> > > of
>> >> > > > > >> the code will just behave correctly without the need to wrap
>> >> the
>> >> > new
>> >> > > > > >> exception into another exception each time it is thrown
>> (maybe
>> >> it
>> >> > is
>> >> > > > > >> thrown at different location in the code).
>> >> > > > > >>
>> >> > > > > >> As far as I understand the following statement from your
>> >> previous
>> >> > > > e-mail
>> >> > > > > >> is the reason that currently such a super exception is not
>> >> > possible:
>> >> > > > > >>
>> >> > > > > >> "Right now we have RetriableException type, if we are going
>> to
>> >> > add a
>> >> > > > > >> `ProducerRetriableException` type, we have to put this new
>> >> > interface
>> >> > > > as
>> >> > > > > >> the parent of the RetriableException, because not all thrown
>> >> > > non-fatal
>> >> > > > > >> exceptions are `retriable` in general for producer"
>> >> > > > > >>
>> >> > > > > >>
>> >> > > > > >> In the list of exceptions in your KIP, I found non-fatal
>> >> > exceptions
>> >> > > > that
>> >> > > > > >> do not inherit from RetriableException. I guess those are
>> the
>> >> ones
>> >> > > you
>> >> > > > > >> are referring to in your statement:
>> >> > > > > >>
>> >> > > > > >> InvalidProducerEpochException
>> >> > > > > >> InvalidPidMappingException
>> >> > > > > >> TransactionAbortedException
>> >> > > > > >>
>> >> > > > > >> All of those exceptions are non-fatal and do not inherit
>> from
>> >> > > > > >> RetriableException. Is there a reason for that? If they
>> >> depended
>> >> > > from
>> >> > > > > >> RetriableException we would be a bit closer to a super
>> >> exception I
>> >> > > > > >> mention above.
>> >> > > > > >>
>> >> > > > > >> The reason is that sender may catch those exceptions in the
>> >> > > > > > ProduceResponse, and it currently does infinite
>> >> > > > > > retries on RetriableException. To make sure we could trigger
>> the
>> >> > > > > > abortTransaction() by catching non-fatal thrown
>> >> > > > > > exceptions and reinitialize the txn state, we chose not to
>> let
>> >> > those
>> >> > > > > > exceptions inherit RetriableException so that
>> >> > > > > > they won't cause infinite retry on sender.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >> With OutOfOrderSequenceException and
>> >> UnknownProducerIdException, I
>> >> > > > think
>> >> > > > > >> to understand that their fatality depends on the type (i.e.
>> >> > > > > >> configuration) of the producer. That makes it difficult to
>> >> have a
>> >> > > > super
>> >> > > > > >> exception that encodes the retriability as mentioned above.
>> >> Would
>> >> > it
>> >> > > > be
>> >> > > > > >> possible to introduce exceptions that inherit from
>> >> > > RetriableException
>> >> > > > > >> and that are thrown when those exceptions are caught from
>> the
>> >> > > brokers
>> >> > > > > >> and the type of the producer is such that the exceptions are
>> >> > > > retriable?
>> >> > > > > >>
>> >> > > > > >> Yea, I think in general the exception type mixing is
>> difficult
>> >> to
>> >> > > get
>> >> > > > > > them all right. I have already proposed another solution
>> based
>> >> on
>> >> > my
>> >> > > > > > offline discussion with some folks working on EOS
>> >> > > > > > to make the handling more straightforward for end users
>> without
>> >> the
>> >> > > > need
>> >> > > > > > to distinguish exception fatality.
>> >> > > > > >
>> >> > > > > >> Best,
>> >> > > > > >> Bruno
>> >> > > > > >>
>> >> > > > > >>
>> >> > > > > >> On 04.12.20 19:34, Guozhang Wang wrote:
>> >> > > > > >> > Thanks Boyang for the proposal! I made a pass over the
>> list
>> >> and
>> >> > > here
>> >> > > > > are
>> >> > > > > >> > some thoughts:
>> >> > > > > >> >
>> >> > > > > >> > 0) Although this is not part of the public API, I think we
>> >> > should
>> >> > > > make
>> >> > > > > >> sure
>> >> > > > > >> > that the suggested pattern (i.e. user can always call
>> >> abortTxn()
>> >> > > > when
>> >> > > > > >> > handling non-fatal errors) are indeed supported. E.g. if
>> the
>> >> txn
>> >> > > is
>> >> > > > > >> already
>> >> > > > > >> > aborted by the broker side, then users can still call
>> >> abortTxn
>> >> > > which
>> >> > > > > >> would
>> >> > > > > >> > not throw another exception but just be treated as a
>> no-op.
>> >> > > > > >> >
>> >> > > > > >> > 1) *ConcurrentTransactionsException*: I think this error
>> can
>> >> > also
>> >> > > be
>> >> > > > > >> > returned but not documented yet. This should be a
>> non-fatal
>> >> > error.
>> >> > > > > >> >
>> >> > > > > >> > 2) *InvalidTxnStateException*: this error is returned from
>> >> > broker
>> >> > > > when
>> >> > > > > >> txn
>> >> > > > > >> > state transition failed (e.g. it is trying to transit to
>> >> > > > > complete-commit
>> >> > > > > >> > while the current state is not prepare-commit). This error
>> >> could
>> >> > > > > >> indicates
>> >> > > > > >> > a bug on the client internal code or the broker code, OR a
>> >> user
>> >> > > > error
>> >> > > > > >> --- a
>> >> > > > > >> > similar error is ConcurrentTransactionsException, i.e. if
>> >> Kafka
>> >> > is
>> >> > > > > >> bug-free
>> >> > > > > >> > these exceptions would only be returned if users try to do
>> >> > > something
>> >> > > > > >> wrong,
>> >> > > > > >> > e.g. calling abortTxn right after a commitTxn, etc. So I'm
>> >> > > thinking
>> >> > > > it
>> >> > > > > >> > should be a non-fatal error instead of a fatal error,
>> wdyt?
>> >> > > > > >> >
>> >> > > > > >> > 3) *KafkaException*: case i "indicates fatal transactional
>> >> > > sequence
>> >> > > > > >> > (Fatal)", this is a bit conflicting with the
>> >> > > > *OutOfSequenceException*
>> >> > > > > >> that
>> >> > > > > >> > is treated as non-fatal. I guess your proposal is that
>> >> > > > > >> > OutOfOrderSequenceException would be treated either as
>> fatal
>> >> > with
>> >> > > > > >> > transactional producer, or non-fatal with idempotent
>> >> producer,
>> >> > is
>> >> > > > that
>> >> > > > > >> > right? If the producer is only configured with idempotency
>> >> but
>> >> > not
>> >> > > > > >> > transaction, then throwing a
>> >> TransactionStateCorruptedException
>> >> > > for
>> >> > > > > >> > non-fatal errors would be confusing since users are not
>> using
>> >> > > > > >> transactions
>> >> > > > > >> > at all.. So I suggest we always throw
>> OutOfSequenceException
>> >> > as-is
>> >> > > > > (i.e.
>> >> > > > > >> > not wrapped) no matter how the producer is configured, and
>> >> let
>> >> > the
>> >> > > > > >> caller
>> >> > > > > >> > decide how to handle it based on whether it is only
>> >> idempotent
>> >> > or
>> >> > > > > >> > transactional itself.
>> >> > > > > >> >
>> >> > > > > >> > 4) Besides all the txn APIs, the `send()` callback /
>> future
>> >> can
>> >> > > also
>> >> > > > > >> throw
>> >> > > > > >> > txn-related exceptions, I think this KIP should also cover
>> >> this
>> >> > > API
>> >> > > > as
>> >> > > > > >> well?
>> >> > > > > >> >
>> >> > > > > >> > 5) This is related to 1/2) above: sometimes those
>> non-fatal
>> >> > errors
>> >> > > > > like
>> >> > > > > >> > ConcurrentTxn or InvalidTxnState are not due to the state
>> >> being
>> >> > > > > >> corrupted
>> >> > > > > >> > at the broker side, but maybe users are doing something
>> >> wrong.
>> >> > So
>> >> > > > I'm
>> >> > > > > >> > wondering if we should further distinguish those non-fatal
>> >> > errors
>> >> > > > > >> between
>> >> > > > > >> > a) those that are caused by Kafka itself, e.g. a broker
>> timed
>> >> > out
>> >> > > > and
>> >> > > > > >> > aborted a txn and later an endTxn request is received,
>> and b)
>> >> > the
>> >> > > > > user's
>> >> > > > > >> > API call pattern is incorrect, causing the request to be
>> >> > rejected
>> >> > > > with
>> >> > > > > >> an
>> >> > > > > >> > error code from the broker.
>> >> *TransactionStateCorruptedException*
>> >> > > > feels
>> >> > > > > >> to
>> >> > > > > >> > me more like for case a), but not case b).
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> > Guozhang
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> > On Wed, Dec 2, 2020 at 4:50 PM Boyang Chen <
>> >> > > > > reluctanthero...@gmail.com>
>> >> > > > > >> > wrote:
>> >> > > > > >> >
>> >> > > > > >> >> Thanks Matthias, I think your proposal makes sense as
>> well,
>> >> on
>> >> > > the
>> >> > > > > pro
>> >> > > > > >> side
>> >> > > > > >> >> we could have a universally agreed exception type to be
>> >> caught
>> >> > by
>> >> > > > the
>> >> > > > > >> user,
>> >> > > > > >> >> without having an extra layer on top of the actual
>> >> exceptions.
>> >> > I
>> >> > > > > could
>> >> > > > > >> see
>> >> > > > > >> >> some issue on downsides:
>> >> > > > > >> >>
>> >> > > > > >> >> 1. The exception hierarchy will be more complex. Right
>> now
>> >> we
>> >> > > have
>> >> > > > > >> >> RetriableException type, if we are going to add a
>> >> > > > > >> >> `ProducerRetriableException` type, we have to put this
>> new
>> >> > > > interface
>> >> > > > > >> as the
>> >> > > > > >> >> parent of the RetriableException, because not all thrown
>> >> > > non-fatal
>> >> > > > > >> >> exceptions are `retriable` in general for producer, for
>> >> example
>> >> > > > > >> >> <
>> >> > > > > >> >>
>> >> > > > > >>
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://github.com/apache/kafka/blob/e275742f850af4a1b79b0d1bd1ac9a1d2e89c64e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L745
>> >> > > > > >> >>> .
>> >> > > > > >> >> We could have an empty interface
>> >> `ProducerRetriableException`
>> >> > to
>> >> > > > let
>> >> > > > > >> all
>> >> > > > > >> >> the thrown exceptions implement for sure, but it's a bit
>> >> > > unorthodox
>> >> > > > > in
>> >> > > > > >> the
>> >> > > > > >> >> way we deal with exceptions in general.
>> >> > > > > >> >>
>> >> > > > > >> >> 2. There are cases where we throw a KafkaException
>> wrapping
>> >> > > another
>> >> > > > > >> >> KafkaException as either fatal or non-fatal. If we use an
>> >> > > interface
>> >> > > > > to
>> >> > > > > >> >> solve #1, it is also required to implement another
>> bloated
>> >> > > > exception
>> >> > > > > >> class
>> >> > > > > >> >> which could replace KafkaException type, as we couldn't
>> mark
>> >> > > > > >> KafkaException
>> >> > > > > >> >> as retriable for sure.
>> >> > > > > >> >>
>> >> > > > > >> >> 3. In terms of the encapsulation, wrapping means we could
>> >> limit
>> >> > > the
>> >> > > > > >> scope
>> >> > > > > >> >> of affection to the producer only, which is important
>> since
>> >> we
>> >> > > > don't
>> >> > > > > >> want
>> >> > > > > >> >> shared exception types to implement a producer-related
>> >> > interface,
>> >> > > > > such
>> >> > > > > >> >> as UnknownTopicOrPartitionException.
>> >> > > > > >> >>
>> >> > > > > >> >> Best,
>> >> > > > > >> >> Boyang
>> >> > > > > >> >>
>> >> > > > > >> >> On Wed, Dec 2, 2020 at 3:38 PM Matthias J. Sax <
>> >> > mj...@apache.org
>> >> > > >
>> >> > > > > >> wrote:
>> >> > > > > >> >>
>> >> > > > > >> >>> Thanks for the KIP Boyang!
>> >> > > > > >> >>>
>> >> > > > > >> >>> Overall, categorizing exceptions makes a lot of sense.
>> As I
>> >> > > don't
>> >> > > > > know
>> >> > > > > >> >>> the producer internals well enough, I cannot comment on
>> the
>> >> > > > > >> >>> categorization in detail though.
>> >> > > > > >> >>>
>> >> > > > > >> >>> What I am wondering is, if we should introduce an
>> exception
>> >> > > > > interface
>> >> > > > > >> >>> that non-fatal exception would implement instead of
>> >> creating a
>> >> > > new
>> >> > > > > >> class
>> >> > > > > >> >>> that will wrap non-fatal exceptions? What would be the
>> >> > pros/cons
>> >> > > > for
>> >> > > > > >> >>> both designs?
>> >> > > > > >> >>>
>> >> > > > > >> >>>
>> >> > > > > >> >>> -Matthias
>> >> > > > > >> >>>
>> >> > > > > >> >>>
>> >> > > > > >> >>> On 12/2/20 11:35 AM, Boyang Chen wrote:
>> >> > > > > >> >>>> Hey there,
>> >> > > > > >> >>>>
>> >> > > > > >> >>>> I would like to start a discussion thread for KIP-691:
>> >> > > > > >> >>>>
>> >> > > > > >> >>>
>> >> > > > > >> >>
>> >> > > > > >>
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-691%3A+Enhance+Transactional+Producer+Exception+Handling
>> >> > > > > >> >>>>
>> >> > > > > >> >>>> The KIP is aiming to simplify the exception handling
>> logic
>> >> > for
>> >> > > > > >> >>>> transactional Producer users by classifying fatal and
>> >> > non-fatal
>> >> > > > > >> >>> exceptions
>> >> > > > > >> >>>> and throw them correspondingly for easier catch and
>> retry.
>> >> > Let
>> >> > > me
>> >> > > > > >> know
>> >> > > > > >> >>> what
>> >> > > > > >> >>>> you think.
>> >> > > > > >> >>>>
>> >> > > > > >> >>>> Best,
>> >> > > > > >> >>>> Boyang
>> >> > > > > >> >>>>
>> >> > > > > >> >>>
>> >> > > > > >> >>
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >>
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > -- Guozhang
>> >> > > >
>> >> > >
>> >> >
>> >> >
>> >> > --
>> >> > -- Guozhang
>> >> >
>> >>
>> >
>> >
>> > --
>> > -- Guozhang
>> >
>>
>>
>> --
>> -- Guozhang
>>
>

Re: [DISCUSS] KIP-691: Transactional Producer Exception Handling

Reply via email to