Thanks Artem for valuable comments I have incorporated them into the updated KIP.
On Sat, Oct 19, 2024 at 3:31 AM Artem Livshits <alivsh...@confluent.io.invalid> wrote: > Hi Kaushik, > > Thank you for the KIP! I think it'll make writing transactional > application easier and less error prone. I have a couple comments: > > AL1. The keep proposes changing the semantics > of UnknownProducerIdException. Currently, this error is never returned by > the broker so we cannot validate whether changing semantics is going to be > compatible with a future use (if we ever return it again) and the future > broker wouldn't know which sematnics the client supports. I think for now > we could just add a comment in the code that this error is never returned > by the broker and if we ever add it to the broker we'd need to make sure > it's categorized properly and bump the API version so that the broker knows > if the client supports required semantics. > > AL2. The exception classification looks good to me from the producer > perspective: we have retriable, abortable, etc. categories. The retriable > errors can be retried within producer safely without losing idempotence, > the messages would be retried with the same sequence numbers and we won't > have duplicates. However, if a retriable error (e.g. TimeoutException) is > thrown to the application, it's not safe to retry the produce, because the > messages will get new sequence numbers and if the original message in fact > succeeded, the new messages would become duplicates, losing the exactly > once semantics. Thus, if a retriable exception is thrown from a > transactional "produce" we should abort the transaction to guarantee that > we won't have duplicates when we produce messages again. To avoid > confusion, I propose to change the transactional "produce" to never return > a retriable error, i.e. any retriable error would be translated > into TransactionAbortableException (the error message could be copied from > the original error) before getting thrown from transactional "produce". > This would avoid bugs in the applicatoin -- even if the application > developer gets confused and decides to retry on TimeoutException, the buggy > code would never run. > > Another change that we should make is to make sure .abortTransaction() > should never throw producer-abortable exception. > > The handling of retriable errors for other APIs can stay unchanged -- > .commitTransaction() and .abortTransaction() are idempotent and can be > safely retried, read-only APIs can always be safely retried, and for > non-transactional producer we don't have a good "path back to normal" > anyway -- we cannot reliably abort messages with unknown fate. > > -Artem > > On Fri, Oct 4, 2024 at 5:11 AM Lianet M. <liane...@gmail.com> wrote: > > > Hello, thanks for the KIP! After going through the KIP and discussion > here > > are some initial comments. > > > > 107 - I understand we’re proposing a new > > ProducerRetriableTransactionException, and changing existing exceptions > to > > inherit from it (the ones on the table below it). The existing > exceptions > > inherit from RetriableException today, but with this KIP, they would > > inherit from ProducerRetriableTransactionException which is not a > > RetriableException ("*ProducerRetriableTransactionException extends > > ApiException"*). Is my understanding correct? Wouldn’t this break > > applications that could be handling/expecting RetriableExceptions today? > > (Ie. apps dealing with TimeoutException on send , if they have > > catch(RetriableException) or checks in the form of instanceOf > > RetriableException, would need to change to the new > > ProducerRetriableTransactionException or the specific TimeoutException, > > right?). I get this wouldn’t bring a problem for most of the retriable > > exceptions on the table given that they end up being handled/retried > > internally, but TimeoutException is tricky. > > > > > > 108 - Regarding how we limit the scope of the change to the > > producer/transactional API. TimeoutException is not only used in the > > transactional API, but also in the consumer API, propagated to the user > in > > multiple api calls. Not clear to me how with this proposal we wouldn’t > end > > up with a consumer throwing a TimeoutException instanceOf > > ProducerRetriableTransactionException? (Instead of instanceOf > > RetriableException like it is today)? Again, potentially breaking apps > but > > also with a conceptually wrong consumer error? > > > > > > 109 - Similar to above, for exceptions like > > UnknownTopicOrPartitionException, which are today instanceOf > > RetriableException, if we’re saying they will be subclass of > > ProducerRefreshRetriableTransactionException (ApiException) that will > > affect the consumer logic too, where we do handle RetriableExceptions > like > > the unknownTopic, expecting RetriableException. This is all internal > logic > > and could be updated as needed of course, but without leaking > > producer-specific groupings into the consumer I would expect. > > > > > > 110 - The KIP refers to the existing TransactionAbortableException (from > > KIP-890), but on the public changes it refers to class > > AbortableTransactionException extends ApiException. So are we proposing a > > new exception type for this or reusing the existing one? > > > > 111 - I notice the proposed new exceptions, even though they seem like > > abstract groupings, are not defined as "abstract". Is it intentional to > > allow creation of instances of those? > > > > Best! > > > > Lianet > > > > > > On Thu, Sep 12, 2024 at 6:26 AM Kaushik Raina > <kra...@confluent.io.invalid > > > > > wrote: > > > > > Thanks for review Matthias > > > > > > 100/101 - Updated in KIP > > > > > > 104 - Added explicitly > > > `For Producer-Retriable errors, the producer handles retries > internally, > > > keeping the failure details hidden from the application. Conversely, > > other > > > types of exceptions will be surfaced to the application code for > > handling.` > > > > > > 105 - Grouped default exceptions explicitly > > > `We will handle all default exceptions as generic unknown errors, which > > > will be application recoverable. Below are few such exceptions:` > > > > > > > > > On Sat, Aug 31, 2024 at 4:27 AM Matthias J. Sax <mj...@apache.org> > > wrote: > > > > > > > Thanks for updating the KIP. It's much clearer now what you propose. > I > > > > have a bunch of question about the proposal: > > > > > > > > > > > > > > > > (100) nit (typo / missing word?): > > > > > > > > > We would new error types > > > > > > > > > > > > > > > > (101) `TransactionAbortableException`, `ProducerFencedException`, and > > > > `UnknownProducerIdException` are listed twice in the tables. > > > > > > > > > > > > > > > > (102) You introduce a new exception `AbortableTransactionException` > > > > which will only be extended by `TransactionAbortableException`. Given > > > > that the existing TransactionAbortableException is not thrown by the > > > > producer right now (no passed into the `Callback`), it seem if the > > > > producer starts to throw/return the exiting > > > > `TransactionAbortableException` or the new > > > > `AbortableTransactionException` is would be an incompatible change? > > > > > > > > > > > > > > > > (103) It's unclear which method would throw the new > > > > `AbortableTransactionException` and/or if this new exception might be > > > > passe into the producer's send `Callback`. > > > > > > > > > > > > > > > > Btw: KIP-890 does not mention `TransactionAbortableException`... Does > > > > KIP-890 need an update? KIP-890 only mentions a new error code > > > > TRANSACTION_ABORTABLE -- or is this an implicit introduction of > > > > TransactionAbortableException -- I am not familiar with the details > how > > > > core KIPs are written? > > > > > > > > > > > > > > > > (104) The KIP does not explicitly say, which of the new exceptions > are > > > > actually user facing? It seems only AbortableTransactionException, > > > > ApplicationRecoverableTransactionException, and > > > > InvalidConfiguationTransactionException are exception which user will > > be > > > > able to catch (or handle vie the `Callback`), while > > > > ProducerRetriableTransactionException and > > > > ProducerRefreshRetriableTransactionException won't be thrown/return > by > > > > the producer into the app code? > > > > > > > > > > > > > > > > (105) `IllegalStateException` and `RuntimeException` which are Java > > > > exceptions are listed in the table of > > > > `ApplicationRecoverableTransactionException`: I think it is not valid > > to > > > > list them, as we cannot change their super-class. > > > > > > > > > > > > > > > > (106) `UnknownMemberIdException`, `IllegalGenerationException`, and > > > > `CorrelationIdMismatchException` are listed in the table of > > > > `ApplicationRecoverableTransactionException` but it seems they are > not > > > > thrown/returned by the producer atm. If we start to throw/return > either > > > > of them it seem to be a backward incompatible change. > > > > > > > > > > > > > > > > (106) Similarly to 105, `InvalidRecordException` and > > > > `InvalidRequiredAcksException` are listed in the table of > > > > `InvalidConfiguationTransactionException` but they seem not to be > user > > > > facing. > > > > > > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > On 7/25/24 8:50 AM, Kaushik Raina wrote: > > > > > Additionally, > > > > > - We will be depreciating KIP-691 in favour of KIP-1050. > > > > > > > > > > > > > > > On Fri, Jun 21, 2024 at 12:20 PM Kaushik Raina < > kra...@confluent.io> > > > > wrote: > > > > > > > > > >> Thanks Matthias for feedback > > > > >> - We have updates KIP and grouped exceptions > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > >> > > > > >> - Regarding compatibility, all changes in KIP are expected to be > > > > *backword > > > > >> compatible*. We have updated KIP to make it clear. > > > > >> > > > > >> > > > > >> On Tue, Jun 11, 2024 at 4:50 AM Matthias J. Sax <mj...@apache.org > > > > > > wrote: > > > > >> > > > > >>> Thanks for this KIP. Great to see it. I would assume it will make > > > > >>> KIP-691 unnecessary? > > > > >>> > > > > >>> I don't think I fully understand the proposal yet. It's clear, > that > > > you > > > > >>> propose to add new sub-classed to group existing exceptions. But > > it's > > > > >>> not clear to me, which of the existing exceptions (which > implement > > > > >>> ApiException directly right now) will get a new parent class and > go > > > > into > > > > >>> the same group. You only list `InvalidProducerEpochException` > which > > > > gets > > > > >>> `AbortableTransactionException` as new parent. It would help a > lot, > > > if > > > > >>> you could list out explicitly, which existing exceptions are > > grouped > > > > >>> together via which sub-class. > > > > >>> > > > > >>> It should be sufficient to just add a list for each group. For > the > > > > newly > > > > >>> added exception classes, I would also omit all constructors etc > and > > > > just > > > > >>> add a comment about it -- having constructors listed out does not > > add > > > > >>> much value to the KIP itself but makes it harder to read (it's > > > > >>> effectively noise we can avoid IMHO). > > > > >>> > > > > >>> > > > > >>> > > > > >>> I am also wondering about compatibility? If I read the section > > > > >>> correctly, you actually propose to introduce a > > > non-backward-compatible > > > > >>> change? > > > > >>> > > > > >>>> Based on type of exception thrown, user needs to change their > > > > exception > > > > >>> catching logic to take actions against their exception handling. > > > > >>> > > > > >>> Ie, an application cannot be upgrade w/o code changes? I am not > > sure > > > if > > > > >>> this is acceptable? > > > > >>> > > > > >>> I think it would be much better (not sure if feasible) to keep > the > > > old > > > > >>> behavior and let users opt-in / enable the new semantics via a > > > config. > > > > >>> If the new behavior is disabled, we could log a WARN that the app > > > > should > > > > >>> upgrade to work with the new semantics, and we would only enforce > > the > > > > >>> new behavior in a later major release. > > > > >>> > > > > >>> Thoughts? > > > > >>> > > > > >>> > > > > >>> > > > > >>> -Matthias > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> On 6/7/24 4:06 AM, Kaushik Raina wrote: > > > > >>>> Thank you Andrew for feedback > > > > >>>> > > > > >>>> 1. We are suggesting to only update subclasses of > > > > >>>> o.a.k.common.errors.ApiException, which are used in > transactions. > > > All > > > > >>> such > > > > >>>> subclasses are mentioned in Exception table > > > > >>>> < > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > >>>> > > > > >>>> > > > > >>>> 2. "Producer-Recoverable" corresponds to the > AbortableException. I > > > > have > > > > >>>> updated comments on each exception type. > > > > >>>> > > > > >>>> 3. Yes, it's correct that by adding a "Retriable" exception, it > > > > >>> simplifies > > > > >>>> the determination of which errors can be retried internally. In > > the > > > > >>> Exception > > > > >>>> table > > > > >>>> < > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-ExceptionTable > > > > >>>> > > > > >>>> mentioned > > > > >>>> in the "Proposed Changes" section, the "Expected Handling" > column > > > > >>> signifies > > > > >>>> the handling for each error type. Please let me know if any > > further > > > > >>>> clarification is needed. > > > > >>>> > > > > >>>> 4a. Yes, that is correct. For clarity, only one constructor has > > been > > > > >>>> mentioned in the KIP. An ellipsis has been added as a > placeholder, > > > > >>>> indicating that there are additional functions in the class but > > they > > > > are > > > > >>>> not explicitly specified. > > > > >>>> 4b. Updated in the KIP. > > > > >>>> > > > > >>>> 5. TopicAuthorizationException extends "Invalid Configuration". > > > > "Invalid > > > > >>>> Configuration" type can be resolved either by dynamically > updating > > > the > > > > >>>> configuration, which does not require a restart, or by > statically > > > > >>> updating > > > > >>>> it by restarting the application. It is at the application's > > > > discretion > > > > >>> how > > > > >>>> they want to handle each "Invalid Configuration" type. > > > > >>>> > > > > >>>> I have added Client side handling example > > > > >>>> < > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions#KIP1050:ConsistenterrorhandlingforTransactions-Clientsidecodeexample > > > > >>>> > > > > >>>> in > > > > >>>> KIP. Hope that helps. > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > > >