Hey Lianet, Thanks for raising these issues. I think that the parent classes (AuthorizationException and InvalidMetadataException) should extend our new groupings (InvalidConfig and RefreshRetriable). We can confirm this works as expected. I'm not as familiar with CommitFailedException, but we can look into that one and if it makes sense to remove. For StaleMemberEpochException -- I wonder if this exception was not originally returned by transactional requests when the KIP was designed. If it is now, makes sense to include it.
Will let Kaushik confirm all these, but wanted to share my 2c :) Justine On Wed, Nov 6, 2024 at 11:36 AM Lianet M. <liane...@gmail.com> wrote: > Hello, sorry for the late reply, took another pass after the latest > updates. Most of the proposed groupings and handling seem good and very > helpful indeed from the producer perspective. I just have some more > comments related to the changes in the exception hierarchy, that are not > scoped to the producer: > > > LM1 - Regarding TopicAuthorizationException and > GroupAuthorizationException. We’re proposing to change their parent class > to InvalidConfigurationException, which extends ApiException (vs their > current parent AuthorizationException). This would be a breaking change for > apps expecting/handling instanceOf AuthorizationException (and an > unexpected one I would say, given that TopicAuthorizationException and > GroupAuthorizationException are indeed “authorization errors”). > > > LM2 - Similarly, regarding UnknownTopicOrPartitionException and > NotLeaderOrFollowerException: we’re proposing to change parent class to > RefreshRetriableException (which extends Retriable), but that would mean > that these exceptions wouldn’t be instanceOf InvalidMetadataException > anymore (as they are today). This would be a breaking change for client > apps dealing with instanceOf InvalidMetadata right? > > > LM3 - Regarding CommitFailedException, I’m simply concerned that the move > may not be conceptually right? We’re proposing to change its parent class > to ApplicationRecoverableException, which would make CommitFailedException > land under the ApiException umbrella. ApiException is intended for errors > that are part of the protocol, and this CommitFailedException is not (it’s > just an exception generated on the client side based on different > protocol-level errors). Not that it will have an impact on existing apps > given that it would still be a KafkaException as it is today, but wanted to > point this out. > > > LM4 - Under ApplicationRecoverableException, I wonder if we should also > consider StaleMemberEpochException (just for consistency, alongside > FencedInstaceIdException, UnknownMemberIdException,...) > > > Thanks! > > > Lianet > > On Thu, Oct 24, 2024 at 6:25 AM Kaushik Raina <kra...@confluent.io.invalid > > > wrote: > > > Thanks Artem for valuable comments > > > > I have incorporated them into the updated KIP. > > > > > > On Sat, Oct 19, 2024 at 3:31 AM Artem Livshits > > <alivsh...@confluent.io.invalid> wrote: > > > > > Hi Kaushik, > > > > > > Thank you for the KIP! I think it'll make writing transactional > > > application easier and less error prone. I have a couple comments: > > > > > > AL1. The keep proposes changing the semantics > > > of UnknownProducerIdException. Currently, this error is never returned > > by > > > the broker so we cannot validate whether changing semantics is going to > > be > > > compatible with a future use (if we ever return it again) and the > future > > > broker wouldn't know which sematnics the client supports. I think for > > now > > > we could just add a comment in the code that this error is never > returned > > > by the broker and if we ever add it to the broker we'd need to make > sure > > > it's categorized properly and bump the API version so that the broker > > knows > > > if the client supports required semantics. > > > > > > AL2. The exception classification looks good to me from the producer > > > perspective: we have retriable, abortable, etc. categories. The > > retriable > > > errors can be retried within producer safely without losing > idempotence, > > > the messages would be retried with the same sequence numbers and we > won't > > > have duplicates. However, if a retriable error (e.g. TimeoutException) > > is > > > thrown to the application, it's not safe to retry the produce, because > > the > > > messages will get new sequence numbers and if the original message in > > fact > > > succeeded, the new messages would become duplicates, losing the exactly > > > once semantics. Thus, if a retriable exception is thrown from a > > > transactional "produce" we should abort the transaction to guarantee > that > > > we won't have duplicates when we produce messages again. To avoid > > > confusion, I propose to change the transactional "produce" to never > > return > > > a retriable error, i.e. any retriable error would be translated > > > into TransactionAbortableException (the error message could be copied > > from > > > the original error) before getting thrown from transactional "produce". > > > This would avoid bugs in the applicatoin -- even if the application > > > developer gets confused and decides to retry on TimeoutException, the > > buggy > > > code would never run. > > > > > > Another change that we should make is to make sure .abortTransaction() > > > should never throw producer-abortable exception. > > > > > > The handling of retriable errors for other APIs can stay unchanged -- > > > .commitTransaction() and .abortTransaction() are idempotent and can be > > > safely retried, read-only APIs can always be safely retried, and for > > > non-transactional producer we don't have a good "path back to normal" > > > anyway -- we cannot reliably abort messages with unknown fate. > > > > > > -Artem > > > > > > On Fri, Oct 4, 2024 at 5:11 AM Lianet M. <liane...@gmail.com> wrote: > > > > > > > Hello, thanks for the KIP! After going through the KIP and discussion > > > here > > > > are some initial comments. > > > > > > > > 107 - I understand we’re proposing a new > > > > ProducerRetriableTransactionException, and changing existing > exceptions > > > to > > > > inherit from it (the ones on the table below it). The existing > > > exceptions > > > > inherit from RetriableException today, but with this KIP, they would > > > > inherit from ProducerRetriableTransactionException which is not a > > > > RetriableException ("*ProducerRetriableTransactionException extends > > > > ApiException"*). Is my understanding correct? Wouldn’t this break > > > > applications that could be handling/expecting RetriableExceptions > > today? > > > > (Ie. apps dealing with TimeoutException on send , if they have > > > > catch(RetriableException) or checks in the form of instanceOf > > > > RetriableException, would need to change to the new > > > > ProducerRetriableTransactionException or the specific > > TimeoutException, > > > > right?). I get this wouldn’t bring a problem for most of the > retriable > > > > exceptions on the table given that they end up being handled/retried > > > > internally, but TimeoutException is tricky. > > > > > > > > > > > > 108 - Regarding how we limit the scope of the change to the > > > > producer/transactional API. TimeoutException is not only used in the > > > > transactional API, but also in the consumer API, propagated to the > user > > > in > > > > multiple api calls. Not clear to me how with this proposal we > wouldn’t > > > end > > > > up with a consumer throwing a TimeoutException instanceOf > > > > ProducerRetriableTransactionException? (Instead of instanceOf > > > > RetriableException like it is today)? Again, potentially breaking > apps > > > but > > > > also with a conceptually wrong consumer error? > > > > > > > > > > > > 109 - Similar to above, for exceptions like > > > > UnknownTopicOrPartitionException, which are today instanceOf > > > > RetriableException, if we’re saying they will be subclass of > > > > ProducerRefreshRetriableTransactionException (ApiException) that will > > > > affect the consumer logic too, where we do handle RetriableExceptions > > > like > > > > the unknownTopic, expecting RetriableException. This is all internal > > > logic > > > > and could be updated as needed of course, but without leaking > > > > producer-specific groupings into the consumer I would expect. > > > > > > > > > > > > 110 - The KIP refers to the existing TransactionAbortableException > > (from > > > > KIP-890), but on the public changes it refers to class > > > > AbortableTransactionException extends ApiException. So are we > > proposing a > > > > new exception type for this or reusing the existing one? > > > > > > > > 111 - I notice the proposed new exceptions, even though they seem > like > > > > abstract groupings, are not defined as "abstract". Is it intentional > to > > > > allow creation of instances of those? > > > > > > > > Best! > > > > > > > > Lianet > > > > > > > > > > > > On Thu, Sep 12, 2024 at 6:26 AM Kaushik Raina > > > <kra...@confluent.io.invalid > > > > > > > > > wrote: > > > > > > > > > Thanks for review Matthias > > > > > > > > > > 100/101 - Updated in KIP > > > > > > > > > > 104 - Added explicitly > > > > > `For Producer-Retriable errors, the producer handles retries > > > internally, > > > > > keeping the failure details hidden from the application. > Conversely, > > > > other > > > > > types of exceptions will be surfaced to the application code for > > > > handling.` > > > > > > > > > > 105 - Grouped default exceptions explicitly > > > > > `We will handle all default exceptions as generic unknown errors, > > which > > > > > will be application recoverable. Below are few such exceptions:` > > > > > > > > > > > > > > > On Sat, Aug 31, 2024 at 4:27 AM Matthias J. Sax <mj...@apache.org> > > > > wrote: > > > > > > > > > > > Thanks for updating the KIP. It's much clearer now what you > > propose. > > > I > > > > > > have a bunch of question about the proposal: > > > > > > > > > > > > > > > > > > > > > > > > (100) nit (typo / missing word?): > > > > > > > > > > > > > We would new error types > > > > > > > > > > > > > > > > > > > > > > > > (101) `TransactionAbortableException`, `ProducerFencedException`, > > and > > > > > > `UnknownProducerIdException` are listed twice in the tables. > > > > > > > > > > > > > > > > > > > > > > > > (102) You introduce a new exception > `AbortableTransactionException` > > > > > > which will only be extended by `TransactionAbortableException`. > > Given > > > > > > that the existing TransactionAbortableException is not thrown by > > the > > > > > > producer right now (no passed into the `Callback`), it seem if > the > > > > > > producer starts to throw/return the exiting > > > > > > `TransactionAbortableException` or the new > > > > > > `AbortableTransactionException` is would be an incompatible > change? > > > > > > > > > > > > > > > > > > > > > > > > (103) It's unclear which method would throw the new > > > > > > `AbortableTransactionException` and/or if this new exception > might > > be > > > > > > passe into the producer's send `Callback`. > > > > > > > > > > > > > > > > > > > > > > > > Btw: KIP-890 does not mention `TransactionAbortableException`... > > Does > > > > > > KIP-890 need an update? KIP-890 only mentions a new error code > > > > > > TRANSACTION_ABORTABLE -- or is this an implicit introduction of > > > > > > TransactionAbortableException -- I am not familiar with the > details > > > how > > > > > > core KIPs are written? > > > > > > > > > > > > > > > > > > > > > > > > (104) The KIP does not explicitly say, which of the new > exceptions > > > are > > > > > > actually user facing? It seems only > AbortableTransactionException, > > > > > > ApplicationRecoverableTransactionException, and > > > > > > InvalidConfiguationTransactionException are exception which user > > will > > > > be > > > > > > able to catch (or handle vie the `Callback`), while > > > > > > ProducerRetriableTransactionException and > > > > > > ProducerRefreshRetriableTransactionException won't be > thrown/return > > > by > > > > > > the producer into the app code? > > > > > > > > > > > > > > > > > > > > > > > > (105) `IllegalStateException` and `RuntimeException` which are > Java > > > > > > exceptions are listed in the table of > > > > > > `ApplicationRecoverableTransactionException`: I think it is not > > valid > > > > to > > > > > > list them, as we cannot change their super-class. > > > > > > > > > > > > > > > > > > > > > > > > (106) `UnknownMemberIdException`, `IllegalGenerationException`, > and > > > > > > `CorrelationIdMismatchException` are listed in the table of > > > > > > `ApplicationRecoverableTransactionException` but it seems they > are > > > not > > > > > > thrown/returned by the producer atm. If we start to throw/return > > > either > > > > > > of them it seem to be a backward incompatible change. > > > > > > > > > > > > > > > > > > > > > > > > (106) Similarly to 105, `InvalidRecordException` and > > > > > > `InvalidRequiredAcksException` are listed in the table of > > > > > > `InvalidConfiguationTransactionException` but they seem not to be > > > user > > > > > > facing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > > > > On 7/25/24 8:50 AM, Kaushik Raina wrote: > > > > > > > Additionally, > > > > > > > - We will be depreciating KIP-691 in favour of KIP-1050. > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2024 at 12:20 PM Kaushik Raina < > > > kra...@confluent.io> > > > > > > wrote: > > > > > > > > > > > > > >> Thanks Matthias for feedback > > > > > > >> - We have updates KIP and grouped exceptions > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > >> > > > > > > >> - Regarding compatibility, all changes in KIP are expected to > > be > > > > > > *backword > > > > > > >> compatible*. We have updated KIP to make it clear. > > > > > > >> > > > > > > >> > > > > > > >> On Tue, Jun 11, 2024 at 4:50 AM Matthias J. Sax < > > mj...@apache.org > > > > > > > > > > wrote: > > > > > > >> > > > > > > >>> Thanks for this KIP. Great to see it. I would assume it will > > make > > > > > > >>> KIP-691 unnecessary? > > > > > > >>> > > > > > > >>> I don't think I fully understand the proposal yet. It's > clear, > > > that > > > > > you > > > > > > >>> propose to add new sub-classed to group existing exceptions. > > But > > > > it's > > > > > > >>> not clear to me, which of the existing exceptions (which > > > implement > > > > > > >>> ApiException directly right now) will get a new parent class > > and > > > go > > > > > > into > > > > > > >>> the same group. You only list `InvalidProducerEpochException` > > > which > > > > > > gets > > > > > > >>> `AbortableTransactionException` as new parent. It would help > a > > > lot, > > > > > if > > > > > > >>> you could list out explicitly, which existing exceptions are > > > > grouped > > > > > > >>> together via which sub-class. > > > > > > >>> > > > > > > >>> It should be sufficient to just add a list for each group. > For > > > the > > > > > > newly > > > > > > >>> added exception classes, I would also omit all constructors > etc > > > and > > > > > > just > > > > > > >>> add a comment about it -- having constructors listed out does > > not > > > > add > > > > > > >>> much value to the KIP itself but makes it harder to read > (it's > > > > > > >>> effectively noise we can avoid IMHO). > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> I am also wondering about compatibility? If I read the > section > > > > > > >>> correctly, you actually propose to introduce a > > > > > non-backward-compatible > > > > > > >>> change? > > > > > > >>> > > > > > > >>>> Based on type of exception thrown, user needs to change > their > > > > > > exception > > > > > > >>> catching logic to take actions against their exception > > handling. > > > > > > >>> > > > > > > >>> Ie, an application cannot be upgrade w/o code changes? I am > not > > > > sure > > > > > if > > > > > > >>> this is acceptable? > > > > > > >>> > > > > > > >>> I think it would be much better (not sure if feasible) to > keep > > > the > > > > > old > > > > > > >>> behavior and let users opt-in / enable the new semantics via > a > > > > > config. > > > > > > >>> If the new behavior is disabled, we could log a WARN that the > > app > > > > > > should > > > > > > >>> upgrade to work with the new semantics, and we would only > > enforce > > > > the > > > > > > >>> new behavior in a later major release. > > > > > > >>> > > > > > > >>> Thoughts? > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> -Matthias > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> On 6/7/24 4:06 AM, Kaushik Raina wrote: > > > > > > >>>> Thank you Andrew for feedback > > > > > > >>>> > > > > > > >>>> 1. We are suggesting to only update subclasses of > > > > > > >>>> o.a.k.common.errors.ApiException, which are used in > > > transactions. > > > > > All > > > > > > >>> such > > > > > > >>>> subclasses are mentioned in Exception table > > > > > > >>>> < > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> 2. "Producer-Recoverable" corresponds to the > > > AbortableException. I > > > > > > have > > > > > > >>>> updated comments on each exception type. > > > > > > >>>> > > > > > > >>>> 3. Yes, it's correct that by adding a "Retriable" exception, > > it > > > > > > >>> simplifies > > > > > > >>>> the determination of which errors can be retried internally. > > In > > > > the > > > > > > >>> Exception > > > > > > >>>> table > > > > > > >>>> < > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > >>>> > > > > > > >>>> mentioned > > > > > > >>>> in the "Proposed Changes" section, the "Expected Handling" > > > column > > > > > > >>> signifies > > > > > > >>>> the handling for each error type. Please let me know if any > > > > further > > > > > > >>>> clarification is needed. > > > > > > >>>> > > > > > > >>>> 4a. Yes, that is correct. For clarity, only one constructor > > has > > > > been > > > > > > >>>> mentioned in the KIP. An ellipsis has been added as a > > > placeholder, > > > > > > >>>> indicating that there are additional functions in the class > > but > > > > they > > > > > > are > > > > > > >>>> not explicitly specified. > > > > > > >>>> 4b. Updated in the KIP. > > > > > > >>>> > > > > > > >>>> 5. TopicAuthorizationException extends "Invalid > > Configuration". > > > > > > "Invalid > > > > > > >>>> Configuration" type can be resolved either by dynamically > > > updating > > > > > the > > > > > > >>>> configuration, which does not require a restart, or by > > > statically > > > > > > >>> updating > > > > > > >>>> it by restarting the application. It is at the application's > > > > > > discretion > > > > > > >>> how > > > > > > >>>> they want to handle each "Invalid Configuration" type. > > > > > > >>>> > > > > > > >>>> I have added Client side handling example > > > > > > >>>> < > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-Clientsidecodeexample > > > > > > >>>> > > > > > > >>>> in > > > > > > >>>> KIP. Hope that helps. > > > > > > >>>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >