Thanks Lianet,

Updated KIP to include LM5

On Wed, Nov 13, 2024 at 9:14 PM Lianet M. <liane...@gmail.com> wrote:

> Hello, thanks for the updates. Just a minor comment left
>
> LM5. Should we add the change to the AuthorizationException parent to the
> public interfaces section? Same for the InvalidMetadataException.
>
> Thanks!
>
> Lianet
>
>
> On Tue, Nov 12, 2024 at 1:52 PM Kaushik Raina <kra...@confluent.io.invalid
> >
> wrote:
>
> > Thanks Lianet for review
> >
> > LM1 & LM2:
> > We will extend parent classes only to maintain the hierarchy
> >
> > For TopicAuthorizationException and GroupAuthorizationException, we will
> > extend parent class AuthorizationException. So new hierarchy will be
> > "AuthorizationException < InvalidConfigurationException < ApiException"
> >
> > For UnknownTopicOrPartitionException and NotLeaderOrFollowerException, we
> > will extend parent class InvalidMetadataException. So new hierarchy will
> be
> > "InvalidMetadataException < RefreshRetriableException <
> RetriableException
> > < ApiException"
> >
> >
> > LM3:
> > CommitFailedException will not be extended because its parent class
> > KafkaException is expected to be treated as application recoverable as
> > mentioned in KIP.
> >
> >
> > I have added a `comment` section in the KIP-1050 table to include LM1 &
> LM2
> > & LM3 details.
> >
> >
> > LM4: StaleMemberEpochException is thrown as IllegalGenerationException in
> > Txn
> >
> >
> https://github.com/apache/kafka/blob/trunk/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/OffsetMetadataManager.java#L367-L379
> > . So it need not be part of KIP
> >
> > On Thu, Nov 7, 2024 at 1:05 AM Lianet M. <liane...@gmail.com> wrote:
> >
> > > Hello, sorry for the late reply, took another pass after the latest
> > > updates. Most of the proposed groupings and handling seem good and very
> > > helpful indeed from the producer perspective. I just have some more
> > > comments related to the changes in the exception hierarchy, that are
> not
> > > scoped to the producer:
> > >
> > >
> > > LM1 - Regarding TopicAuthorizationException and
> > > GroupAuthorizationException. We’re proposing to change their parent
> class
> > > to InvalidConfigurationException, which extends ApiException (vs their
> > > current parent AuthorizationException). This would be a breaking change
> > for
> > > apps expecting/handling instanceOf AuthorizationException (and an
> > > unexpected one I would say, given that TopicAuthorizationException and
> > > GroupAuthorizationException are indeed “authorization errors”).
> > >
> > >
> > > LM2 - Similarly, regarding UnknownTopicOrPartitionException and
> > > NotLeaderOrFollowerException: we’re proposing to change parent class to
> > > RefreshRetriableException (which extends Retriable), but that would
> mean
> > > that these exceptions wouldn’t be instanceOf InvalidMetadataException
> > > anymore (as they are today). This would be a breaking change for client
> > > apps dealing with instanceOf InvalidMetadata right?
> > >
> > >
> > > LM3 - Regarding CommitFailedException, I’m simply concerned that the
> move
> > > may not be conceptually right? We’re proposing to change its parent
> class
> > > to ApplicationRecoverableException, which would make
> > CommitFailedException
> > > land under the ApiException umbrella. ApiException is intended for
> errors
> > > that are part of the protocol, and this CommitFailedException is not
> > (it’s
> > > just an exception generated on the client side based on different
> > > protocol-level errors). Not that it will have an impact on existing
> apps
> > > given that it would still be a KafkaException as it is today, but
> wanted
> > to
> > > point this out.
> > >
> > >
> > > LM4 - Under ApplicationRecoverableException, I wonder if we should also
> > > consider StaleMemberEpochException (just for consistency, alongside
> > > FencedInstaceIdException, UnknownMemberIdException,...)
> > >
> > >
> > > Thanks!
> > >
> > >
> > > Lianet
> > >
> > > On Thu, Oct 24, 2024 at 6:25 AM Kaushik Raina
> > <kra...@confluent.io.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks Artem for valuable comments
> > > >
> > > > I have incorporated them into the updated KIP.
> > > >
> > > >
> > > > On Sat, Oct 19, 2024 at 3:31 AM Artem Livshits
> > > > <alivsh...@confluent.io.invalid> wrote:
> > > >
> > > > > Hi Kaushik,
> > > > >
> > > > > Thank you for the KIP!  I think it'll make writing transactional
> > > > > application easier and less error prone.  I have a couple comments:
> > > > >
> > > > > AL1.  The keep proposes changing the semantics
> > > > > of UnknownProducerIdException.  Currently, this error is never
> > returned
> > > > by
> > > > > the broker so we cannot validate whether changing semantics is
> going
> > to
> > > > be
> > > > > compatible with a future use (if we ever return it again) and the
> > > future
> > > > > broker wouldn't know which sematnics the client supports.  I think
> > for
> > > > now
> > > > > we could just add a comment in the code that this error is never
> > > returned
> > > > > by the broker and if we ever add it to the broker we'd need to make
> > > sure
> > > > > it's categorized properly and bump the API version so that the
> broker
> > > > knows
> > > > > if the client supports required semantics.
> > > > >
> > > > > AL2. The exception classification looks good to me from the
> producer
> > > > > perspective: we have retriable, abortable, etc. categories.  The
> > > > retriable
> > > > > errors can be retried within producer safely without losing
> > > idempotence,
> > > > > the messages would be retried with the same sequence numbers and we
> > > won't
> > > > > have duplicates.  However, if a retriable error (e.g.
> > TimeoutException)
> > > > is
> > > > > thrown to the application, it's not safe to retry the produce,
> > because
> > > > the
> > > > > messages will get new sequence numbers and if the original message
> in
> > > > fact
> > > > > succeeded, the new messages would become duplicates, losing the
> > exactly
> > > > > once semantics.  Thus, if a retriable exception is thrown from a
> > > > > transactional "produce" we should abort the transaction to
> guarantee
> > > that
> > > > > we won't have duplicates when we produce messages again.  To avoid
> > > > > confusion, I propose to change the transactional "produce" to never
> > > > return
> > > > > a retriable error, i.e. any retriable error would be translated
> > > > > into TransactionAbortableException (the error message could be
> copied
> > > > from
> > > > > the original error) before getting thrown from transactional
> > "produce".
> > > > > This would avoid bugs in the applicatoin -- even if the application
> > > > > developer gets confused and decides to retry on TimeoutException,
> the
> > > > buggy
> > > > > code would never run.
> > > > >
> > > > > Another change that we should make is to make sure
> > .abortTransaction()
> > > > > should never throw producer-abortable exception.
> > > > >
> > > > > The handling of retriable errors for other APIs can stay unchanged
> --
> > > > > .commitTransaction() and .abortTransaction() are idempotent and can
> > be
> > > > > safely retried, read-only APIs can always be safely retried, and
> for
> > > > > non-transactional producer we don't have a good "path back to
> normal"
> > > > > anyway -- we cannot reliably abort messages with unknown fate.
> > > > >
> > > > > -Artem
> > > > >
> > > > > On Fri, Oct 4, 2024 at 5:11 AM Lianet M. <liane...@gmail.com>
> wrote:
> > > > >
> > > > > > Hello, thanks for the KIP! After going through the KIP and
> > discussion
> > > > > here
> > > > > > are some initial comments.
> > > > > >
> > > > > > 107 - I understand we’re  proposing a new
> > > > > > ProducerRetriableTransactionException, and changing existing
> > > exceptions
> > > > > to
> > > > > > inherit from it (the ones on the table below it).  The existing
> > > > > exceptions
> > > > > > inherit from RetriableException today, but with this KIP, they
> > would
> > > > > > inherit from ProducerRetriableTransactionException which is not a
> > > > > > RetriableException ("*ProducerRetriableTransactionException
> extends
> > > > > > ApiException"*). Is my understanding correct? Wouldn’t this break
> > > > > > applications that could be handling/expecting RetriableExceptions
> > > > today?
> > > > > > (Ie. apps dealing with TimeoutException on send , if they have
> > > > > > catch(RetriableException) or checks in the form of instanceOf
> > > > > > RetriableException, would need to change to the new
> > > > > >  ProducerRetriableTransactionException or the specific
> > > > TimeoutException,
> > > > > > right?). I get this wouldn’t bring a problem for most of the
> > > retriable
> > > > > > exceptions on the table given that they end up being
> > handled/retried
> > > > > > internally, but TimeoutException is tricky.
> > > > > >
> > > > > >
> > > > > > 108 -  Regarding how we limit the scope of the change to the
> > > > > > producer/transactional API. TimeoutException is not only used in
> > the
> > > > > > transactional API, but also in the consumer API, propagated to
> the
> > > user
> > > > > in
> > > > > > multiple api calls. Not clear to me how with this proposal we
> > > wouldn’t
> > > > > end
> > > > > > up with a consumer throwing a TimeoutException instanceOf
> > > > > > ProducerRetriableTransactionException? (Instead of instanceOf
> > > > > > RetriableException like it is today)? Again, potentially breaking
> > > apps
> > > > > but
> > > > > > also with a conceptually wrong consumer error?
> > > > > >
> > > > > >
> > > > > > 109 - Similar to above, for exceptions like
> > > > > > UnknownTopicOrPartitionException, which are today instanceOf
> > > > > > RetriableException, if we’re saying they will be subclass of
> > > > > > ProducerRefreshRetriableTransactionException (ApiException) that
> > will
> > > > > > affect the consumer logic too, where we do handle
> > RetriableExceptions
> > > > > like
> > > > > > the unknownTopic, expecting RetriableException. This is all
> > internal
> > > > > logic
> > > > > > and could be updated as needed of course, but without leaking
> > > > > > producer-specific groupings into the consumer I would expect.
> > > > > >
> > > > > >
> > > > > > 110 - The KIP refers to the existing
> TransactionAbortableException
> > > > (from
> > > > > > KIP-890), but on the public changes it refers to class
> > > > > > AbortableTransactionException extends ApiException. So are we
> > > > proposing a
> > > > > > new exception type for this or reusing the existing one?
> > > > > >
> > > > > > 111 - I notice the proposed new exceptions, even though they seem
> > > like
> > > > > > abstract groupings, are not defined as "abstract". Is it
> > intentional
> > > to
> > > > > > allow creation of instances of those?
> > > > > >
> > > > > > Best!
> > > > > >
> > > > > > Lianet
> > > > > >
> > > > > >
> > > > > > On Thu, Sep 12, 2024 at 6:26 AM Kaushik Raina
> > > > > <kra...@confluent.io.invalid
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for review Matthias
> > > > > > >
> > > > > > > 100/101 - Updated in KIP
> > > > > > >
> > > > > > > 104 - Added explicitly
> > > > > > > `For Producer-Retriable errors, the producer handles retries
> > > > > internally,
> > > > > > > keeping the failure details hidden from the application.
> > > Conversely,
> > > > > > other
> > > > > > > types of exceptions will be surfaced to the application code
> for
> > > > > > handling.`
> > > > > > >
> > > > > > > 105 - Grouped default exceptions explicitly
> > > > > > > `We will handle all default exceptions as generic unknown
> errors,
> > > > which
> > > > > > > will be application recoverable. Below are few such
> exceptions:`
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Aug 31, 2024 at 4:27 AM Matthias J. Sax <
> > mj...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for updating the KIP. It's much clearer now what you
> > > > propose.
> > > > > I
> > > > > > > > have a bunch of question about the proposal:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (100) nit (typo / missing word?):
> > > > > > > >
> > > > > > > > > We would new error types
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (101) `TransactionAbortableException`,
> > `ProducerFencedException`,
> > > > and
> > > > > > > > `UnknownProducerIdException` are listed twice in the tables.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (102) You introduce a new exception
> > > `AbortableTransactionException`
> > > > > > > > which will only be extended by
> `TransactionAbortableException`.
> > > > Given
> > > > > > > > that the existing TransactionAbortableException is not thrown
> > by
> > > > the
> > > > > > > > producer right now (no passed into the `Callback`), it seem
> if
> > > the
> > > > > > > > producer starts to throw/return the exiting
> > > > > > > > `TransactionAbortableException` or the new
> > > > > > > > `AbortableTransactionException` is would be an incompatible
> > > change?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (103) It's unclear which method would throw the new
> > > > > > > > `AbortableTransactionException` and/or if this new exception
> > > might
> > > > be
> > > > > > > > passe into the producer's send `Callback`.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Btw: KIP-890 does not mention
> > `TransactionAbortableException`...
> > > > Does
> > > > > > > > KIP-890 need an update? KIP-890 only mentions a new error
> code
> > > > > > > > TRANSACTION_ABORTABLE -- or is this an implicit introduction
> of
> > > > > > > > TransactionAbortableException -- I am not familiar with the
> > > details
> > > > > how
> > > > > > > > core KIPs are written?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (104) The KIP does not explicitly say, which of the new
> > > exceptions
> > > > > are
> > > > > > > > actually user facing? It seems only
> > > AbortableTransactionException,
> > > > > > > > ApplicationRecoverableTransactionException, and
> > > > > > > > InvalidConfiguationTransactionException are exception which
> > user
> > > > will
> > > > > > be
> > > > > > > > able to catch (or handle vie the `Callback`), while
> > > > > > > > ProducerRetriableTransactionException and
> > > > > > > > ProducerRefreshRetriableTransactionException won't be
> > > thrown/return
> > > > > by
> > > > > > > > the producer into the app code?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (105) `IllegalStateException` and `RuntimeException` which
> are
> > > Java
> > > > > > > > exceptions are listed in the table of
> > > > > > > > `ApplicationRecoverableTransactionException`: I think it is
> not
> > > > valid
> > > > > > to
> > > > > > > > list them, as we cannot change their super-class.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (106) `UnknownMemberIdException`,
> `IllegalGenerationException`,
> > > and
> > > > > > > > `CorrelationIdMismatchException` are listed in the table of
> > > > > > > > `ApplicationRecoverableTransactionException` but it seems
> they
> > > are
> > > > > not
> > > > > > > > thrown/returned by the producer atm. If we start to
> > throw/return
> > > > > either
> > > > > > > > of them it seem to be a backward incompatible change.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (106) Similarly to 105, `InvalidRecordException` and
> > > > > > > > `InvalidRequiredAcksException` are listed in the table of
> > > > > > > > `InvalidConfiguationTransactionException` but they seem not
> to
> > be
> > > > > user
> > > > > > > > facing.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > -Matthias
> > > > > > > >
> > > > > > > >
> > > > > > > > On 7/25/24 8:50 AM, Kaushik Raina wrote:
> > > > > > > > > Additionally,
> > > > > > > > > - We will be depreciating KIP-691 in favour of KIP-1050.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jun 21, 2024 at 12:20 PM Kaushik Raina <
> > > > > kra...@confluent.io>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Thanks Matthias for feedback
> > > > > > > > >> - We have updates KIP and grouped exceptions
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable
> > > > > > > > >>
> > > > > > > > >> - Regarding compatibility,  all changes in KIP are
> expected
> > to
> > > > be
> > > > > > > > *backword
> > > > > > > > >> compatible*.  We have updated KIP to make it clear.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Tue, Jun 11, 2024 at 4:50 AM Matthias J. Sax <
> > > > mj...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> Thanks for this KIP. Great to see it. I would assume it
> > will
> > > > make
> > > > > > > > >>> KIP-691 unnecessary?
> > > > > > > > >>>
> > > > > > > > >>> I don't think I fully understand the proposal yet. It's
> > > clear,
> > > > > that
> > > > > > > you
> > > > > > > > >>> propose to add new sub-classed to group existing
> > exceptions.
> > > > But
> > > > > > it's
> > > > > > > > >>> not clear to me, which of the existing exceptions (which
> > > > > implement
> > > > > > > > >>> ApiException directly right now) will get a new parent
> > class
> > > > and
> > > > > go
> > > > > > > > into
> > > > > > > > >>> the same group. You only list
> > `InvalidProducerEpochException`
> > > > > which
> > > > > > > > gets
> > > > > > > > >>> `AbortableTransactionException` as new parent. It would
> > help
> > > a
> > > > > lot,
> > > > > > > if
> > > > > > > > >>> you could list out explicitly, which existing exceptions
> > are
> > > > > > grouped
> > > > > > > > >>> together via which sub-class.
> > > > > > > > >>>
> > > > > > > > >>> It should be sufficient to just add a list for each
> group.
> > > For
> > > > > the
> > > > > > > > newly
> > > > > > > > >>> added exception classes, I would also omit all
> constructors
> > > etc
> > > > > and
> > > > > > > > just
> > > > > > > > >>> add a comment about it -- having constructors listed out
> > does
> > > > not
> > > > > > add
> > > > > > > > >>> much value to the KIP itself but makes it harder to read
> > > (it's
> > > > > > > > >>> effectively noise we can avoid IMHO).
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> I am also wondering about compatibility? If I read the
> > > section
> > > > > > > > >>> correctly, you actually propose to introduce a
> > > > > > > non-backward-compatible
> > > > > > > > >>> change?
> > > > > > > > >>>
> > > > > > > > >>>> Based on type of exception thrown, user needs to change
> > > their
> > > > > > > > exception
> > > > > > > > >>> catching logic to take actions against their exception
> > > > handling.
> > > > > > > > >>>
> > > > > > > > >>> Ie, an application cannot be upgrade w/o code changes? I
> am
> > > not
> > > > > > sure
> > > > > > > if
> > > > > > > > >>> this is acceptable?
> > > > > > > > >>>
> > > > > > > > >>> I think it would be much better (not sure if feasible) to
> > > keep
> > > > > the
> > > > > > > old
> > > > > > > > >>> behavior and let users opt-in / enable the new semantics
> > via
> > > a
> > > > > > > config.
> > > > > > > > >>> If the new behavior is disabled, we could log a WARN that
> > the
> > > > app
> > > > > > > > should
> > > > > > > > >>> upgrade to work with the new semantics, and we would only
> > > > enforce
> > > > > > the
> > > > > > > > >>> new behavior in a later major release.
> > > > > > > > >>>
> > > > > > > > >>> Thoughts?
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> -Matthias
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> On 6/7/24 4:06 AM, Kaushik Raina wrote:
> > > > > > > > >>>> Thank you Andrew for feedback
> > > > > > > > >>>>
> > > > > > > > >>>> 1. We are suggesting to only update subclasses of
> > > > > > > > >>>> o.a.k.common.errors.ApiException, which are used in
> > > > > transactions.
> > > > > > > All
> > > > > > > > >>> such
> > > > > > > > >>>> subclasses are mentioned in Exception table
> > > > > > > > >>>> <
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>> 2. "Producer-Recoverable" corresponds to the
> > > > > AbortableException. I
> > > > > > > > have
> > > > > > > > >>>> updated comments on each exception type.
> > > > > > > > >>>>
> > > > > > > > >>>> 3. Yes, it's correct that by adding a "Retriable"
> > exception,
> > > > it
> > > > > > > > >>> simplifies
> > > > > > > > >>>> the determination of which errors can be retried
> > internally.
> > > > In
> > > > > > the
> > > > > > > > >>> Exception
> > > > > > > > >>>> table
> > > > > > > > >>>> <
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable
> > > > > > > > >>>>
> > > > > > > > >>>> mentioned
> > > > > > > > >>>> in the "Proposed Changes" section, the "Expected
> Handling"
> > > > > column
> > > > > > > > >>> signifies
> > > > > > > > >>>> the handling for each error type. Please let me know if
> > any
> > > > > > further
> > > > > > > > >>>> clarification is needed.
> > > > > > > > >>>>
> > > > > > > > >>>> 4a. Yes, that is correct. For clarity, only one
> > constructor
> > > > has
> > > > > > been
> > > > > > > > >>>> mentioned in the KIP. An ellipsis has been added as a
> > > > > placeholder,
> > > > > > > > >>>> indicating that there are additional functions in the
> > class
> > > > but
> > > > > > they
> > > > > > > > are
> > > > > > > > >>>> not explicitly specified.
> > > > > > > > >>>> 4b. Updated in the KIP.
> > > > > > > > >>>>
> > > > > > > > >>>> 5. TopicAuthorizationException extends "Invalid
> > > > Configuration".
> > > > > > > > "Invalid
> > > > > > > > >>>> Configuration" type can be resolved either by
> dynamically
> > > > > updating
> > > > > > > the
> > > > > > > > >>>> configuration, which does not require a restart, or by
> > > > > statically
> > > > > > > > >>> updating
> > > > > > > > >>>> it by restarting the application. It is at the
> > application's
> > > > > > > > discretion
> > > > > > > > >>> how
> > > > > > > > >>>> they want to handle each "Invalid Configuration" type.
> > > > > > > > >>>>
> > > > > > > > >>>> I have added Client side handling example
> > > > > > > > >>>> <
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-Clientsidecodeexample
> > > > > > > > >>>>
> > > > > > > > >>>> in
> > > > > > > > >>>> KIP. Hope that helps.
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to