Thanks Lianet, Updated KIP to include LM5
On Wed, Nov 13, 2024 at 9:14 PM Lianet M. <liane...@gmail.com> wrote: > Hello, thanks for the updates. Just a minor comment left > > LM5. Should we add the change to the AuthorizationException parent to the > public interfaces section? Same for the InvalidMetadataException. > > Thanks! > > Lianet > > > On Tue, Nov 12, 2024 at 1:52 PM Kaushik Raina <kra...@confluent.io.invalid > > > wrote: > > > Thanks Lianet for review > > > > LM1 & LM2: > > We will extend parent classes only to maintain the hierarchy > > > > For TopicAuthorizationException and GroupAuthorizationException, we will > > extend parent class AuthorizationException. So new hierarchy will be > > "AuthorizationException < InvalidConfigurationException < ApiException" > > > > For UnknownTopicOrPartitionException and NotLeaderOrFollowerException, we > > will extend parent class InvalidMetadataException. So new hierarchy will > be > > "InvalidMetadataException < RefreshRetriableException < > RetriableException > > < ApiException" > > > > > > LM3: > > CommitFailedException will not be extended because its parent class > > KafkaException is expected to be treated as application recoverable as > > mentioned in KIP. > > > > > > I have added a `comment` section in the KIP-1050 table to include LM1 & > LM2 > > & LM3 details. > > > > > > LM4: StaleMemberEpochException is thrown as IllegalGenerationException in > > Txn > > > > > https://github.com/apache/kafka/blob/trunk/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/OffsetMetadataManager.java#L367-L379 > > . So it need not be part of KIP > > > > On Thu, Nov 7, 2024 at 1:05 AM Lianet M. <liane...@gmail.com> wrote: > > > > > Hello, sorry for the late reply, took another pass after the latest > > > updates. Most of the proposed groupings and handling seem good and very > > > helpful indeed from the producer perspective. I just have some more > > > comments related to the changes in the exception hierarchy, that are > not > > > scoped to the producer: > > > > > > > > > LM1 - Regarding TopicAuthorizationException and > > > GroupAuthorizationException. We’re proposing to change their parent > class > > > to InvalidConfigurationException, which extends ApiException (vs their > > > current parent AuthorizationException). This would be a breaking change > > for > > > apps expecting/handling instanceOf AuthorizationException (and an > > > unexpected one I would say, given that TopicAuthorizationException and > > > GroupAuthorizationException are indeed “authorization errors”). > > > > > > > > > LM2 - Similarly, regarding UnknownTopicOrPartitionException and > > > NotLeaderOrFollowerException: we’re proposing to change parent class to > > > RefreshRetriableException (which extends Retriable), but that would > mean > > > that these exceptions wouldn’t be instanceOf InvalidMetadataException > > > anymore (as they are today). This would be a breaking change for client > > > apps dealing with instanceOf InvalidMetadata right? > > > > > > > > > LM3 - Regarding CommitFailedException, I’m simply concerned that the > move > > > may not be conceptually right? We’re proposing to change its parent > class > > > to ApplicationRecoverableException, which would make > > CommitFailedException > > > land under the ApiException umbrella. ApiException is intended for > errors > > > that are part of the protocol, and this CommitFailedException is not > > (it’s > > > just an exception generated on the client side based on different > > > protocol-level errors). Not that it will have an impact on existing > apps > > > given that it would still be a KafkaException as it is today, but > wanted > > to > > > point this out. > > > > > > > > > LM4 - Under ApplicationRecoverableException, I wonder if we should also > > > consider StaleMemberEpochException (just for consistency, alongside > > > FencedInstaceIdException, UnknownMemberIdException,...) > > > > > > > > > Thanks! > > > > > > > > > Lianet > > > > > > On Thu, Oct 24, 2024 at 6:25 AM Kaushik Raina > > <kra...@confluent.io.invalid > > > > > > > wrote: > > > > > > > Thanks Artem for valuable comments > > > > > > > > I have incorporated them into the updated KIP. > > > > > > > > > > > > On Sat, Oct 19, 2024 at 3:31 AM Artem Livshits > > > > <alivsh...@confluent.io.invalid> wrote: > > > > > > > > > Hi Kaushik, > > > > > > > > > > Thank you for the KIP! I think it'll make writing transactional > > > > > application easier and less error prone. I have a couple comments: > > > > > > > > > > AL1. The keep proposes changing the semantics > > > > > of UnknownProducerIdException. Currently, this error is never > > returned > > > > by > > > > > the broker so we cannot validate whether changing semantics is > going > > to > > > > be > > > > > compatible with a future use (if we ever return it again) and the > > > future > > > > > broker wouldn't know which sematnics the client supports. I think > > for > > > > now > > > > > we could just add a comment in the code that this error is never > > > returned > > > > > by the broker and if we ever add it to the broker we'd need to make > > > sure > > > > > it's categorized properly and bump the API version so that the > broker > > > > knows > > > > > if the client supports required semantics. > > > > > > > > > > AL2. The exception classification looks good to me from the > producer > > > > > perspective: we have retriable, abortable, etc. categories. The > > > > retriable > > > > > errors can be retried within producer safely without losing > > > idempotence, > > > > > the messages would be retried with the same sequence numbers and we > > > won't > > > > > have duplicates. However, if a retriable error (e.g. > > TimeoutException) > > > > is > > > > > thrown to the application, it's not safe to retry the produce, > > because > > > > the > > > > > messages will get new sequence numbers and if the original message > in > > > > fact > > > > > succeeded, the new messages would become duplicates, losing the > > exactly > > > > > once semantics. Thus, if a retriable exception is thrown from a > > > > > transactional "produce" we should abort the transaction to > guarantee > > > that > > > > > we won't have duplicates when we produce messages again. To avoid > > > > > confusion, I propose to change the transactional "produce" to never > > > > return > > > > > a retriable error, i.e. any retriable error would be translated > > > > > into TransactionAbortableException (the error message could be > copied > > > > from > > > > > the original error) before getting thrown from transactional > > "produce". > > > > > This would avoid bugs in the applicatoin -- even if the application > > > > > developer gets confused and decides to retry on TimeoutException, > the > > > > buggy > > > > > code would never run. > > > > > > > > > > Another change that we should make is to make sure > > .abortTransaction() > > > > > should never throw producer-abortable exception. > > > > > > > > > > The handling of retriable errors for other APIs can stay unchanged > -- > > > > > .commitTransaction() and .abortTransaction() are idempotent and can > > be > > > > > safely retried, read-only APIs can always be safely retried, and > for > > > > > non-transactional producer we don't have a good "path back to > normal" > > > > > anyway -- we cannot reliably abort messages with unknown fate. > > > > > > > > > > -Artem > > > > > > > > > > On Fri, Oct 4, 2024 at 5:11 AM Lianet M. <liane...@gmail.com> > wrote: > > > > > > > > > > > Hello, thanks for the KIP! After going through the KIP and > > discussion > > > > > here > > > > > > are some initial comments. > > > > > > > > > > > > 107 - I understand we’re proposing a new > > > > > > ProducerRetriableTransactionException, and changing existing > > > exceptions > > > > > to > > > > > > inherit from it (the ones on the table below it). The existing > > > > > exceptions > > > > > > inherit from RetriableException today, but with this KIP, they > > would > > > > > > inherit from ProducerRetriableTransactionException which is not a > > > > > > RetriableException ("*ProducerRetriableTransactionException > extends > > > > > > ApiException"*). Is my understanding correct? Wouldn’t this break > > > > > > applications that could be handling/expecting RetriableExceptions > > > > today? > > > > > > (Ie. apps dealing with TimeoutException on send , if they have > > > > > > catch(RetriableException) or checks in the form of instanceOf > > > > > > RetriableException, would need to change to the new > > > > > > ProducerRetriableTransactionException or the specific > > > > TimeoutException, > > > > > > right?). I get this wouldn’t bring a problem for most of the > > > retriable > > > > > > exceptions on the table given that they end up being > > handled/retried > > > > > > internally, but TimeoutException is tricky. > > > > > > > > > > > > > > > > > > 108 - Regarding how we limit the scope of the change to the > > > > > > producer/transactional API. TimeoutException is not only used in > > the > > > > > > transactional API, but also in the consumer API, propagated to > the > > > user > > > > > in > > > > > > multiple api calls. Not clear to me how with this proposal we > > > wouldn’t > > > > > end > > > > > > up with a consumer throwing a TimeoutException instanceOf > > > > > > ProducerRetriableTransactionException? (Instead of instanceOf > > > > > > RetriableException like it is today)? Again, potentially breaking > > > apps > > > > > but > > > > > > also with a conceptually wrong consumer error? > > > > > > > > > > > > > > > > > > 109 - Similar to above, for exceptions like > > > > > > UnknownTopicOrPartitionException, which are today instanceOf > > > > > > RetriableException, if we’re saying they will be subclass of > > > > > > ProducerRefreshRetriableTransactionException (ApiException) that > > will > > > > > > affect the consumer logic too, where we do handle > > RetriableExceptions > > > > > like > > > > > > the unknownTopic, expecting RetriableException. This is all > > internal > > > > > logic > > > > > > and could be updated as needed of course, but without leaking > > > > > > producer-specific groupings into the consumer I would expect. > > > > > > > > > > > > > > > > > > 110 - The KIP refers to the existing > TransactionAbortableException > > > > (from > > > > > > KIP-890), but on the public changes it refers to class > > > > > > AbortableTransactionException extends ApiException. So are we > > > > proposing a > > > > > > new exception type for this or reusing the existing one? > > > > > > > > > > > > 111 - I notice the proposed new exceptions, even though they seem > > > like > > > > > > abstract groupings, are not defined as "abstract". Is it > > intentional > > > to > > > > > > allow creation of instances of those? > > > > > > > > > > > > Best! > > > > > > > > > > > > Lianet > > > > > > > > > > > > > > > > > > On Thu, Sep 12, 2024 at 6:26 AM Kaushik Raina > > > > > <kra...@confluent.io.invalid > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Thanks for review Matthias > > > > > > > > > > > > > > 100/101 - Updated in KIP > > > > > > > > > > > > > > 104 - Added explicitly > > > > > > > `For Producer-Retriable errors, the producer handles retries > > > > > internally, > > > > > > > keeping the failure details hidden from the application. > > > Conversely, > > > > > > other > > > > > > > types of exceptions will be surfaced to the application code > for > > > > > > handling.` > > > > > > > > > > > > > > 105 - Grouped default exceptions explicitly > > > > > > > `We will handle all default exceptions as generic unknown > errors, > > > > which > > > > > > > will be application recoverable. Below are few such > exceptions:` > > > > > > > > > > > > > > > > > > > > > On Sat, Aug 31, 2024 at 4:27 AM Matthias J. Sax < > > mj...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for updating the KIP. It's much clearer now what you > > > > propose. > > > > > I > > > > > > > > have a bunch of question about the proposal: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (100) nit (typo / missing word?): > > > > > > > > > > > > > > > > > We would new error types > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (101) `TransactionAbortableException`, > > `ProducerFencedException`, > > > > and > > > > > > > > `UnknownProducerIdException` are listed twice in the tables. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (102) You introduce a new exception > > > `AbortableTransactionException` > > > > > > > > which will only be extended by > `TransactionAbortableException`. > > > > Given > > > > > > > > that the existing TransactionAbortableException is not thrown > > by > > > > the > > > > > > > > producer right now (no passed into the `Callback`), it seem > if > > > the > > > > > > > > producer starts to throw/return the exiting > > > > > > > > `TransactionAbortableException` or the new > > > > > > > > `AbortableTransactionException` is would be an incompatible > > > change? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (103) It's unclear which method would throw the new > > > > > > > > `AbortableTransactionException` and/or if this new exception > > > might > > > > be > > > > > > > > passe into the producer's send `Callback`. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Btw: KIP-890 does not mention > > `TransactionAbortableException`... > > > > Does > > > > > > > > KIP-890 need an update? KIP-890 only mentions a new error > code > > > > > > > > TRANSACTION_ABORTABLE -- or is this an implicit introduction > of > > > > > > > > TransactionAbortableException -- I am not familiar with the > > > details > > > > > how > > > > > > > > core KIPs are written? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (104) The KIP does not explicitly say, which of the new > > > exceptions > > > > > are > > > > > > > > actually user facing? It seems only > > > AbortableTransactionException, > > > > > > > > ApplicationRecoverableTransactionException, and > > > > > > > > InvalidConfiguationTransactionException are exception which > > user > > > > will > > > > > > be > > > > > > > > able to catch (or handle vie the `Callback`), while > > > > > > > > ProducerRetriableTransactionException and > > > > > > > > ProducerRefreshRetriableTransactionException won't be > > > thrown/return > > > > > by > > > > > > > > the producer into the app code? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (105) `IllegalStateException` and `RuntimeException` which > are > > > Java > > > > > > > > exceptions are listed in the table of > > > > > > > > `ApplicationRecoverableTransactionException`: I think it is > not > > > > valid > > > > > > to > > > > > > > > list them, as we cannot change their super-class. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (106) `UnknownMemberIdException`, > `IllegalGenerationException`, > > > and > > > > > > > > `CorrelationIdMismatchException` are listed in the table of > > > > > > > > `ApplicationRecoverableTransactionException` but it seems > they > > > are > > > > > not > > > > > > > > thrown/returned by the producer atm. If we start to > > throw/return > > > > > either > > > > > > > > of them it seem to be a backward incompatible change. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (106) Similarly to 105, `InvalidRecordException` and > > > > > > > > `InvalidRequiredAcksException` are listed in the table of > > > > > > > > `InvalidConfiguationTransactionException` but they seem not > to > > be > > > > > user > > > > > > > > facing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > > > > > > > > > > On 7/25/24 8:50 AM, Kaushik Raina wrote: > > > > > > > > > Additionally, > > > > > > > > > - We will be depreciating KIP-691 in favour of KIP-1050. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2024 at 12:20 PM Kaushik Raina < > > > > > kra...@confluent.io> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > >> Thanks Matthias for feedback > > > > > > > > >> - We have updates KIP and grouped exceptions > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > > > >> > > > > > > > > >> - Regarding compatibility, all changes in KIP are > expected > > to > > > > be > > > > > > > > *backword > > > > > > > > >> compatible*. We have updated KIP to make it clear. > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> On Tue, Jun 11, 2024 at 4:50 AM Matthias J. Sax < > > > > mj...@apache.org > > > > > > > > > > > > > > wrote: > > > > > > > > >> > > > > > > > > >>> Thanks for this KIP. Great to see it. I would assume it > > will > > > > make > > > > > > > > >>> KIP-691 unnecessary? > > > > > > > > >>> > > > > > > > > >>> I don't think I fully understand the proposal yet. It's > > > clear, > > > > > that > > > > > > > you > > > > > > > > >>> propose to add new sub-classed to group existing > > exceptions. > > > > But > > > > > > it's > > > > > > > > >>> not clear to me, which of the existing exceptions (which > > > > > implement > > > > > > > > >>> ApiException directly right now) will get a new parent > > class > > > > and > > > > > go > > > > > > > > into > > > > > > > > >>> the same group. You only list > > `InvalidProducerEpochException` > > > > > which > > > > > > > > gets > > > > > > > > >>> `AbortableTransactionException` as new parent. It would > > help > > > a > > > > > lot, > > > > > > > if > > > > > > > > >>> you could list out explicitly, which existing exceptions > > are > > > > > > grouped > > > > > > > > >>> together via which sub-class. > > > > > > > > >>> > > > > > > > > >>> It should be sufficient to just add a list for each > group. > > > For > > > > > the > > > > > > > > newly > > > > > > > > >>> added exception classes, I would also omit all > constructors > > > etc > > > > > and > > > > > > > > just > > > > > > > > >>> add a comment about it -- having constructors listed out > > does > > > > not > > > > > > add > > > > > > > > >>> much value to the KIP itself but makes it harder to read > > > (it's > > > > > > > > >>> effectively noise we can avoid IMHO). > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> I am also wondering about compatibility? If I read the > > > section > > > > > > > > >>> correctly, you actually propose to introduce a > > > > > > > non-backward-compatible > > > > > > > > >>> change? > > > > > > > > >>> > > > > > > > > >>>> Based on type of exception thrown, user needs to change > > > their > > > > > > > > exception > > > > > > > > >>> catching logic to take actions against their exception > > > > handling. > > > > > > > > >>> > > > > > > > > >>> Ie, an application cannot be upgrade w/o code changes? I > am > > > not > > > > > > sure > > > > > > > if > > > > > > > > >>> this is acceptable? > > > > > > > > >>> > > > > > > > > >>> I think it would be much better (not sure if feasible) to > > > keep > > > > > the > > > > > > > old > > > > > > > > >>> behavior and let users opt-in / enable the new semantics > > via > > > a > > > > > > > config. > > > > > > > > >>> If the new behavior is disabled, we could log a WARN that > > the > > > > app > > > > > > > > should > > > > > > > > >>> upgrade to work with the new semantics, and we would only > > > > enforce > > > > > > the > > > > > > > > >>> new behavior in a later major release. > > > > > > > > >>> > > > > > > > > >>> Thoughts? > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> -Matthias > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> On 6/7/24 4:06 AM, Kaushik Raina wrote: > > > > > > > > >>>> Thank you Andrew for feedback > > > > > > > > >>>> > > > > > > > > >>>> 1. We are suggesting to only update subclasses of > > > > > > > > >>>> o.a.k.common.errors.ApiException, which are used in > > > > > transactions. > > > > > > > All > > > > > > > > >>> such > > > > > > > > >>>> subclasses are mentioned in Exception table > > > > > > > > >>>> < > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > >>>> 2. "Producer-Recoverable" corresponds to the > > > > > AbortableException. I > > > > > > > > have > > > > > > > > >>>> updated comments on each exception type. > > > > > > > > >>>> > > > > > > > > >>>> 3. Yes, it's correct that by adding a "Retriable" > > exception, > > > > it > > > > > > > > >>> simplifies > > > > > > > > >>>> the determination of which errors can be retried > > internally. > > > > In > > > > > > the > > > > > > > > >>> Exception > > > > > > > > >>>> table > > > > > > > > >>>> < > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > > > >>>> > > > > > > > > >>>> mentioned > > > > > > > > >>>> in the "Proposed Changes" section, the "Expected > Handling" > > > > > column > > > > > > > > >>> signifies > > > > > > > > >>>> the handling for each error type. Please let me know if > > any > > > > > > further > > > > > > > > >>>> clarification is needed. > > > > > > > > >>>> > > > > > > > > >>>> 4a. Yes, that is correct. For clarity, only one > > constructor > > > > has > > > > > > been > > > > > > > > >>>> mentioned in the KIP. An ellipsis has been added as a > > > > > placeholder, > > > > > > > > >>>> indicating that there are additional functions in the > > class > > > > but > > > > > > they > > > > > > > > are > > > > > > > > >>>> not explicitly specified. > > > > > > > > >>>> 4b. Updated in the KIP. > > > > > > > > >>>> > > > > > > > > >>>> 5. TopicAuthorizationException extends "Invalid > > > > Configuration". > > > > > > > > "Invalid > > > > > > > > >>>> Configuration" type can be resolved either by > dynamically > > > > > updating > > > > > > > the > > > > > > > > >>>> configuration, which does not require a restart, or by > > > > > statically > > > > > > > > >>> updating > > > > > > > > >>>> it by restarting the application. It is at the > > application's > > > > > > > > discretion > > > > > > > > >>> how > > > > > > > > >>>> they want to handle each "Invalid Configuration" type. > > > > > > > > >>>> > > > > > > > > >>>> I have added Client side handling example > > > > > > > > >>>> < > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-Clientsidecodeexample > > > > > > > > >>>> > > > > > > > > >>>> in > > > > > > > > >>>> KIP. Hope that helps. > > > > > > > > >>>> > > > > > > > > >>> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >