Hello, thanks for the updates. Just a minor comment left LM5. Should we add the change to the AuthorizationException parent to the public interfaces section? Same for the InvalidMetadataException.
Thanks! Lianet On Tue, Nov 12, 2024 at 1:52 PM Kaushik Raina <kra...@confluent.io.invalid> wrote: > Thanks Lianet for review > > LM1 & LM2: > We will extend parent classes only to maintain the hierarchy > > For TopicAuthorizationException and GroupAuthorizationException, we will > extend parent class AuthorizationException. So new hierarchy will be > "AuthorizationException < InvalidConfigurationException < ApiException" > > For UnknownTopicOrPartitionException and NotLeaderOrFollowerException, we > will extend parent class InvalidMetadataException. So new hierarchy will be > "InvalidMetadataException < RefreshRetriableException < RetriableException > < ApiException" > > > LM3: > CommitFailedException will not be extended because its parent class > KafkaException is expected to be treated as application recoverable as > mentioned in KIP. > > > I have added a `comment` section in the KIP-1050 table to include LM1 & LM2 > & LM3 details. > > > LM4: StaleMemberEpochException is thrown as IllegalGenerationException in > Txn > > https://github.com/apache/kafka/blob/trunk/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/OffsetMetadataManager.java#L367-L379 > . So it need not be part of KIP > > On Thu, Nov 7, 2024 at 1:05 AM Lianet M. <liane...@gmail.com> wrote: > > > Hello, sorry for the late reply, took another pass after the latest > > updates. Most of the proposed groupings and handling seem good and very > > helpful indeed from the producer perspective. I just have some more > > comments related to the changes in the exception hierarchy, that are not > > scoped to the producer: > > > > > > LM1 - Regarding TopicAuthorizationException and > > GroupAuthorizationException. We’re proposing to change their parent class > > to InvalidConfigurationException, which extends ApiException (vs their > > current parent AuthorizationException). This would be a breaking change > for > > apps expecting/handling instanceOf AuthorizationException (and an > > unexpected one I would say, given that TopicAuthorizationException and > > GroupAuthorizationException are indeed “authorization errors”). > > > > > > LM2 - Similarly, regarding UnknownTopicOrPartitionException and > > NotLeaderOrFollowerException: we’re proposing to change parent class to > > RefreshRetriableException (which extends Retriable), but that would mean > > that these exceptions wouldn’t be instanceOf InvalidMetadataException > > anymore (as they are today). This would be a breaking change for client > > apps dealing with instanceOf InvalidMetadata right? > > > > > > LM3 - Regarding CommitFailedException, I’m simply concerned that the move > > may not be conceptually right? We’re proposing to change its parent class > > to ApplicationRecoverableException, which would make > CommitFailedException > > land under the ApiException umbrella. ApiException is intended for errors > > that are part of the protocol, and this CommitFailedException is not > (it’s > > just an exception generated on the client side based on different > > protocol-level errors). Not that it will have an impact on existing apps > > given that it would still be a KafkaException as it is today, but wanted > to > > point this out. > > > > > > LM4 - Under ApplicationRecoverableException, I wonder if we should also > > consider StaleMemberEpochException (just for consistency, alongside > > FencedInstaceIdException, UnknownMemberIdException,...) > > > > > > Thanks! > > > > > > Lianet > > > > On Thu, Oct 24, 2024 at 6:25 AM Kaushik Raina > <kra...@confluent.io.invalid > > > > > wrote: > > > > > Thanks Artem for valuable comments > > > > > > I have incorporated them into the updated KIP. > > > > > > > > > On Sat, Oct 19, 2024 at 3:31 AM Artem Livshits > > > <alivsh...@confluent.io.invalid> wrote: > > > > > > > Hi Kaushik, > > > > > > > > Thank you for the KIP! I think it'll make writing transactional > > > > application easier and less error prone. I have a couple comments: > > > > > > > > AL1. The keep proposes changing the semantics > > > > of UnknownProducerIdException. Currently, this error is never > returned > > > by > > > > the broker so we cannot validate whether changing semantics is going > to > > > be > > > > compatible with a future use (if we ever return it again) and the > > future > > > > broker wouldn't know which sematnics the client supports. I think > for > > > now > > > > we could just add a comment in the code that this error is never > > returned > > > > by the broker and if we ever add it to the broker we'd need to make > > sure > > > > it's categorized properly and bump the API version so that the broker > > > knows > > > > if the client supports required semantics. > > > > > > > > AL2. The exception classification looks good to me from the producer > > > > perspective: we have retriable, abortable, etc. categories. The > > > retriable > > > > errors can be retried within producer safely without losing > > idempotence, > > > > the messages would be retried with the same sequence numbers and we > > won't > > > > have duplicates. However, if a retriable error (e.g. > TimeoutException) > > > is > > > > thrown to the application, it's not safe to retry the produce, > because > > > the > > > > messages will get new sequence numbers and if the original message in > > > fact > > > > succeeded, the new messages would become duplicates, losing the > exactly > > > > once semantics. Thus, if a retriable exception is thrown from a > > > > transactional "produce" we should abort the transaction to guarantee > > that > > > > we won't have duplicates when we produce messages again. To avoid > > > > confusion, I propose to change the transactional "produce" to never > > > return > > > > a retriable error, i.e. any retriable error would be translated > > > > into TransactionAbortableException (the error message could be copied > > > from > > > > the original error) before getting thrown from transactional > "produce". > > > > This would avoid bugs in the applicatoin -- even if the application > > > > developer gets confused and decides to retry on TimeoutException, the > > > buggy > > > > code would never run. > > > > > > > > Another change that we should make is to make sure > .abortTransaction() > > > > should never throw producer-abortable exception. > > > > > > > > The handling of retriable errors for other APIs can stay unchanged -- > > > > .commitTransaction() and .abortTransaction() are idempotent and can > be > > > > safely retried, read-only APIs can always be safely retried, and for > > > > non-transactional producer we don't have a good "path back to normal" > > > > anyway -- we cannot reliably abort messages with unknown fate. > > > > > > > > -Artem > > > > > > > > On Fri, Oct 4, 2024 at 5:11 AM Lianet M. <liane...@gmail.com> wrote: > > > > > > > > > Hello, thanks for the KIP! After going through the KIP and > discussion > > > > here > > > > > are some initial comments. > > > > > > > > > > 107 - I understand we’re proposing a new > > > > > ProducerRetriableTransactionException, and changing existing > > exceptions > > > > to > > > > > inherit from it (the ones on the table below it). The existing > > > > exceptions > > > > > inherit from RetriableException today, but with this KIP, they > would > > > > > inherit from ProducerRetriableTransactionException which is not a > > > > > RetriableException ("*ProducerRetriableTransactionException extends > > > > > ApiException"*). Is my understanding correct? Wouldn’t this break > > > > > applications that could be handling/expecting RetriableExceptions > > > today? > > > > > (Ie. apps dealing with TimeoutException on send , if they have > > > > > catch(RetriableException) or checks in the form of instanceOf > > > > > RetriableException, would need to change to the new > > > > > ProducerRetriableTransactionException or the specific > > > TimeoutException, > > > > > right?). I get this wouldn’t bring a problem for most of the > > retriable > > > > > exceptions on the table given that they end up being > handled/retried > > > > > internally, but TimeoutException is tricky. > > > > > > > > > > > > > > > 108 - Regarding how we limit the scope of the change to the > > > > > producer/transactional API. TimeoutException is not only used in > the > > > > > transactional API, but also in the consumer API, propagated to the > > user > > > > in > > > > > multiple api calls. Not clear to me how with this proposal we > > wouldn’t > > > > end > > > > > up with a consumer throwing a TimeoutException instanceOf > > > > > ProducerRetriableTransactionException? (Instead of instanceOf > > > > > RetriableException like it is today)? Again, potentially breaking > > apps > > > > but > > > > > also with a conceptually wrong consumer error? > > > > > > > > > > > > > > > 109 - Similar to above, for exceptions like > > > > > UnknownTopicOrPartitionException, which are today instanceOf > > > > > RetriableException, if we’re saying they will be subclass of > > > > > ProducerRefreshRetriableTransactionException (ApiException) that > will > > > > > affect the consumer logic too, where we do handle > RetriableExceptions > > > > like > > > > > the unknownTopic, expecting RetriableException. This is all > internal > > > > logic > > > > > and could be updated as needed of course, but without leaking > > > > > producer-specific groupings into the consumer I would expect. > > > > > > > > > > > > > > > 110 - The KIP refers to the existing TransactionAbortableException > > > (from > > > > > KIP-890), but on the public changes it refers to class > > > > > AbortableTransactionException extends ApiException. So are we > > > proposing a > > > > > new exception type for this or reusing the existing one? > > > > > > > > > > 111 - I notice the proposed new exceptions, even though they seem > > like > > > > > abstract groupings, are not defined as "abstract". Is it > intentional > > to > > > > > allow creation of instances of those? > > > > > > > > > > Best! > > > > > > > > > > Lianet > > > > > > > > > > > > > > > On Thu, Sep 12, 2024 at 6:26 AM Kaushik Raina > > > > <kra...@confluent.io.invalid > > > > > > > > > > > wrote: > > > > > > > > > > > Thanks for review Matthias > > > > > > > > > > > > 100/101 - Updated in KIP > > > > > > > > > > > > 104 - Added explicitly > > > > > > `For Producer-Retriable errors, the producer handles retries > > > > internally, > > > > > > keeping the failure details hidden from the application. > > Conversely, > > > > > other > > > > > > types of exceptions will be surfaced to the application code for > > > > > handling.` > > > > > > > > > > > > 105 - Grouped default exceptions explicitly > > > > > > `We will handle all default exceptions as generic unknown errors, > > > which > > > > > > will be application recoverable. Below are few such exceptions:` > > > > > > > > > > > > > > > > > > On Sat, Aug 31, 2024 at 4:27 AM Matthias J. Sax < > mj...@apache.org> > > > > > wrote: > > > > > > > > > > > > > Thanks for updating the KIP. It's much clearer now what you > > > propose. > > > > I > > > > > > > have a bunch of question about the proposal: > > > > > > > > > > > > > > > > > > > > > > > > > > > > (100) nit (typo / missing word?): > > > > > > > > > > > > > > > We would new error types > > > > > > > > > > > > > > > > > > > > > > > > > > > > (101) `TransactionAbortableException`, > `ProducerFencedException`, > > > and > > > > > > > `UnknownProducerIdException` are listed twice in the tables. > > > > > > > > > > > > > > > > > > > > > > > > > > > > (102) You introduce a new exception > > `AbortableTransactionException` > > > > > > > which will only be extended by `TransactionAbortableException`. > > > Given > > > > > > > that the existing TransactionAbortableException is not thrown > by > > > the > > > > > > > producer right now (no passed into the `Callback`), it seem if > > the > > > > > > > producer starts to throw/return the exiting > > > > > > > `TransactionAbortableException` or the new > > > > > > > `AbortableTransactionException` is would be an incompatible > > change? > > > > > > > > > > > > > > > > > > > > > > > > > > > > (103) It's unclear which method would throw the new > > > > > > > `AbortableTransactionException` and/or if this new exception > > might > > > be > > > > > > > passe into the producer's send `Callback`. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Btw: KIP-890 does not mention > `TransactionAbortableException`... > > > Does > > > > > > > KIP-890 need an update? KIP-890 only mentions a new error code > > > > > > > TRANSACTION_ABORTABLE -- or is this an implicit introduction of > > > > > > > TransactionAbortableException -- I am not familiar with the > > details > > > > how > > > > > > > core KIPs are written? > > > > > > > > > > > > > > > > > > > > > > > > > > > > (104) The KIP does not explicitly say, which of the new > > exceptions > > > > are > > > > > > > actually user facing? It seems only > > AbortableTransactionException, > > > > > > > ApplicationRecoverableTransactionException, and > > > > > > > InvalidConfiguationTransactionException are exception which > user > > > will > > > > > be > > > > > > > able to catch (or handle vie the `Callback`), while > > > > > > > ProducerRetriableTransactionException and > > > > > > > ProducerRefreshRetriableTransactionException won't be > > thrown/return > > > > by > > > > > > > the producer into the app code? > > > > > > > > > > > > > > > > > > > > > > > > > > > > (105) `IllegalStateException` and `RuntimeException` which are > > Java > > > > > > > exceptions are listed in the table of > > > > > > > `ApplicationRecoverableTransactionException`: I think it is not > > > valid > > > > > to > > > > > > > list them, as we cannot change their super-class. > > > > > > > > > > > > > > > > > > > > > > > > > > > > (106) `UnknownMemberIdException`, `IllegalGenerationException`, > > and > > > > > > > `CorrelationIdMismatchException` are listed in the table of > > > > > > > `ApplicationRecoverableTransactionException` but it seems they > > are > > > > not > > > > > > > thrown/returned by the producer atm. If we start to > throw/return > > > > either > > > > > > > of them it seem to be a backward incompatible change. > > > > > > > > > > > > > > > > > > > > > > > > > > > > (106) Similarly to 105, `InvalidRecordException` and > > > > > > > `InvalidRequiredAcksException` are listed in the table of > > > > > > > `InvalidConfiguationTransactionException` but they seem not to > be > > > > user > > > > > > > facing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > > > > > > > On 7/25/24 8:50 AM, Kaushik Raina wrote: > > > > > > > > Additionally, > > > > > > > > - We will be depreciating KIP-691 in favour of KIP-1050. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2024 at 12:20 PM Kaushik Raina < > > > > kra...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > > > >> Thanks Matthias for feedback > > > > > > > >> - We have updates KIP and grouped exceptions > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > > >> > > > > > > > >> - Regarding compatibility, all changes in KIP are expected > to > > > be > > > > > > > *backword > > > > > > > >> compatible*. We have updated KIP to make it clear. > > > > > > > >> > > > > > > > >> > > > > > > > >> On Tue, Jun 11, 2024 at 4:50 AM Matthias J. Sax < > > > mj...@apache.org > > > > > > > > > > > > wrote: > > > > > > > >> > > > > > > > >>> Thanks for this KIP. Great to see it. I would assume it > will > > > make > > > > > > > >>> KIP-691 unnecessary? > > > > > > > >>> > > > > > > > >>> I don't think I fully understand the proposal yet. It's > > clear, > > > > that > > > > > > you > > > > > > > >>> propose to add new sub-classed to group existing > exceptions. > > > But > > > > > it's > > > > > > > >>> not clear to me, which of the existing exceptions (which > > > > implement > > > > > > > >>> ApiException directly right now) will get a new parent > class > > > and > > > > go > > > > > > > into > > > > > > > >>> the same group. You only list > `InvalidProducerEpochException` > > > > which > > > > > > > gets > > > > > > > >>> `AbortableTransactionException` as new parent. It would > help > > a > > > > lot, > > > > > > if > > > > > > > >>> you could list out explicitly, which existing exceptions > are > > > > > grouped > > > > > > > >>> together via which sub-class. > > > > > > > >>> > > > > > > > >>> It should be sufficient to just add a list for each group. > > For > > > > the > > > > > > > newly > > > > > > > >>> added exception classes, I would also omit all constructors > > etc > > > > and > > > > > > > just > > > > > > > >>> add a comment about it -- having constructors listed out > does > > > not > > > > > add > > > > > > > >>> much value to the KIP itself but makes it harder to read > > (it's > > > > > > > >>> effectively noise we can avoid IMHO). > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> I am also wondering about compatibility? If I read the > > section > > > > > > > >>> correctly, you actually propose to introduce a > > > > > > non-backward-compatible > > > > > > > >>> change? > > > > > > > >>> > > > > > > > >>>> Based on type of exception thrown, user needs to change > > their > > > > > > > exception > > > > > > > >>> catching logic to take actions against their exception > > > handling. > > > > > > > >>> > > > > > > > >>> Ie, an application cannot be upgrade w/o code changes? I am > > not > > > > > sure > > > > > > if > > > > > > > >>> this is acceptable? > > > > > > > >>> > > > > > > > >>> I think it would be much better (not sure if feasible) to > > keep > > > > the > > > > > > old > > > > > > > >>> behavior and let users opt-in / enable the new semantics > via > > a > > > > > > config. > > > > > > > >>> If the new behavior is disabled, we could log a WARN that > the > > > app > > > > > > > should > > > > > > > >>> upgrade to work with the new semantics, and we would only > > > enforce > > > > > the > > > > > > > >>> new behavior in a later major release. > > > > > > > >>> > > > > > > > >>> Thoughts? > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> -Matthias > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> On 6/7/24 4:06 AM, Kaushik Raina wrote: > > > > > > > >>>> Thank you Andrew for feedback > > > > > > > >>>> > > > > > > > >>>> 1. We are suggesting to only update subclasses of > > > > > > > >>>> o.a.k.common.errors.ApiException, which are used in > > > > transactions. > > > > > > All > > > > > > > >>> such > > > > > > > >>>> subclasses are mentioned in Exception table > > > > > > > >>>> < > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> 2. "Producer-Recoverable" corresponds to the > > > > AbortableException. I > > > > > > > have > > > > > > > >>>> updated comments on each exception type. > > > > > > > >>>> > > > > > > > >>>> 3. Yes, it's correct that by adding a "Retriable" > exception, > > > it > > > > > > > >>> simplifies > > > > > > > >>>> the determination of which errors can be retried > internally. > > > In > > > > > the > > > > > > > >>> Exception > > > > > > > >>>> table > > > > > > > >>>> < > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > > > > >>>> > > > > > > > >>>> mentioned > > > > > > > >>>> in the "Proposed Changes" section, the "Expected Handling" > > > > column > > > > > > > >>> signifies > > > > > > > >>>> the handling for each error type. Please let me know if > any > > > > > further > > > > > > > >>>> clarification is needed. > > > > > > > >>>> > > > > > > > >>>> 4a. Yes, that is correct. For clarity, only one > constructor > > > has > > > > > been > > > > > > > >>>> mentioned in the KIP. An ellipsis has been added as a > > > > placeholder, > > > > > > > >>>> indicating that there are additional functions in the > class > > > but > > > > > they > > > > > > > are > > > > > > > >>>> not explicitly specified. > > > > > > > >>>> 4b. Updated in the KIP. > > > > > > > >>>> > > > > > > > >>>> 5. TopicAuthorizationException extends "Invalid > > > Configuration". > > > > > > > "Invalid > > > > > > > >>>> Configuration" type can be resolved either by dynamically > > > > updating > > > > > > the > > > > > > > >>>> configuration, which does not require a restart, or by > > > > statically > > > > > > > >>> updating > > > > > > > >>>> it by restarting the application. It is at the > application's > > > > > > > discretion > > > > > > > >>> how > > > > > > > >>>> they want to handle each "Invalid Configuration" type. > > > > > > > >>>> > > > > > > > >>>> I have added Client side handling example > > > > > > > >>>> < > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-Clientsidecodeexample > > > > > > > >>>> > > > > > > > >>>> in > > > > > > > >>>> KIP. Hope that helps. > > > > > > > >>>> > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >