Thanks Matthias. I changed it to `custom.exception.handler` Alieh
On Tue, Apr 23, 2024 at 8:47 AM Matthias J. Sax <mj...@apache.org> wrote: > Thanks Alieh! > > A few nits: > > > 1) The new config we add for the producer should be mentioned in the > "Public Interfaces" section. > > 2) Why do we use `producer.` prefix for a *producer* config? Should it > be `exception.handler` only? > > > -Matthias > > On 4/22/24 7:38 AM, Alieh Saeedi wrote: > > Thank you all for the feedback! > > > > Addressing the main concern: The KIP is about giving the user the ability > > to handle producer exceptions, but to be more conservative and avoid > future > > issues, we decided to be limited to a short list of exceptions. I > included > > *RecordTooLargeExceptin* and *UnknownTopicOrPartitionException. *Open to > > suggestion for adding some more ;-) > > > > KIP Updates: > > - clarified the way that the user should configure the Producer to use > the > > custom handler. I think adding a producer config property is the cleanest > > one. > > - changed the *ClientExceptionHandler* to *ProducerExceptionHandler* to > be > > closer to what we are changing. > > - added the ProducerRecord as the input parameter of the handle() method > as > > well. > > - increased the response types to 3 to have fail and two types of > continue. > > - The default behaviour is having no custom handler, having the > > corresponding config parameter set to null. Therefore, the KIP provides > no > > default implementation of the interface. > > - We follow the interface solution as described in the > > Rejected Alternetives section. > > > > > > Cheers, > > Alieh > > > > > > On Thu, Apr 18, 2024 at 8:11 PM Matthias J. Sax <mj...@apache.org> > wrote: > > > >> Thanks for the KIP Alieh! It addresses an important case for error > >> handling. > >> > >> I agree that using this handler would be an expert API, as mentioned by > >> a few people. But I don't think it would be a reason to not add it. It's > >> always a tricky tradeoff what to expose to users and to avoid foot guns, > >> but we added similar handlers to Kafka Streams, and have good experience > >> with it. Hence, I understand, but don't share the concern raised. > >> > >> I also agree that there is some responsibility by the user to understand > >> how such a handler should be implemented to not drop data by accident. > >> But it seem unavoidable and acceptable. > >> > >> While I understand that a "simpler / reduced" API (eg via configs) might > >> also work, I personally prefer a full handler. Configs have the same > >> issue that they could be miss-used potentially leading to incorrectly > >> dropped data, but at the same time are less flexible (and thus maybe > >> ever harder to use correctly...?). Base on my experience, there is also > >> often weird corner case for which it make sense to also drop records for > >> other exceptions, and a full handler has the advantage of full > >> flexibility and "absolute power!". > >> > >> To be fair: I don't know the exact code paths of the producer in > >> details, so please keep me honest. But my understanding is, that the KIP > >> aims to allow users to react to internal exception, and decide to keep > >> retrying internally, swallow the error and drop the record, or raise the > >> error? > >> > >> Maybe the KIP would need to be a little bit more precises what error we > >> want to cover -- I don't think this list must be exhaustive, as we can > >> always do follow up KIP to also apply the handler to other errors to > >> expand the scope of the handler. The KIP does mention examples, but it > >> might be good to explicitly state for what cases the handler gets > applied? > >> > >> I am also not sure if CONTINUE and FAIL are enough options? Don't we > >> need three options? Or would `CONTINUE` have different meaning depending > >> on the type of error? Ie, for a retryable error `CONTINUE` would mean > >> keep retrying internally, but for a non-retryable error `CONTINUE` means > >> swallow the error and drop the record? This semantic overload seems > >> tricky to reason about by users, so it might better to split `CONTINUE` > >> into two cases -> `RETRY` and `SWALLOW` (or some better names). > >> > >> Additionally, should we just ship a `DefaultClientExceptionHandler` > >> which would return `FAIL`, for backward compatibility. Or don't have any > >> default handler to begin with and allow it to be `null`? I don't see the > >> need for a specific `TransactionExceptionHandler`. To me, the goal > >> should be to not modify the default behavior at all, but to just allow > >> users to change the default behavior if there is a need. > >> > >> What is missing on the KIP though it, how the handler is passed into the > >> producer thought? Would we need a new config which allows to set a > >> custom handler? And/or would we allow to pass in an instance via the > >> constructor or add a new method to set a handler? > >> > >> > >> -Matthias > >> > >> On 4/18/24 10:02 AM, Andrew Schofield wrote: > >>> Hi Alieh, > >>> Thanks for the KIP. > >>> > >>> Exception handling in the Kafka producer and consumer is really not > >> ideal. > >>> It’s even harder working out what’s going on with the consumer. > >>> > >>> I’m a bit nervous about this KIP and I agree with Chris that it could > do > >> with additional > >>> motivation. This would be an expert-level interface given how > complicated > >>> the exception handling for Kafka has become. > >>> > >>> 7. The application is not really aware of the batching being done on > its > >> behalf. > >>> The ProduceResponse can actually return an array of records which > failed > >>> per batch. If you get RecordTooLargeException, and want to retry, you > >> probably > >>> need to remove the offending records from the batch and retry it. This > >> is getting fiddly. > >>> > >>> 8. There is already o.a.k.clients.producer.Callback. I wonder whether > an > >>> alternative might be to add a method to the existing Callback > interface, > >> such as: > >>> > >>> ClientExceptionResponse onException(Exception exception) > >>> > >>> It would be called when a ProduceResponse contains an error, but the > >>> producer is going to retry. It tells the producer whether to go ahead > >> with the retry > >>> or not. The default implementation would be to CONTINUE, because that’s > >>> just continuing to retry as planned. Note that this is a per-record > >> callback, so > >>> the application would be able to understand which records failed. > >>> > >>> By using an existing interface, we already know how to configure it and > >> we know > >>> about the threading model for calling it. > >>> > >>> > >>> Thanks, > >>> Andrew > >>> > >>> > >>> > >>>> On 17 Apr 2024, at 18:17, Chris Egerton <chr...@aiven.io.INVALID> > >> wrote: > >>>> > >>>> Hi Alieh, > >>>> > >>>> Thanks for the KIP! The issue with writing to non-existent topics is > >>>> particularly frustrating for users of Kafka Connect and has been the > >> source > >>>> of a handful of Jira tickets over the years. My thoughts: > >>>> > >>>> 1. An additional detail we can add to the motivation (or possibly > >> rejected > >>>> alternatives) section is that this kind of custom retry logic can't be > >>>> implemented by hand by, e.g., setting retries to 0 in the producer > >> config > >>>> and handling exceptions at the application level. Or rather, it can, > >> but 1) > >>>> it's a bit painful to have to reimplement at every call-site for > >>>> Producer::send (and any code that awaits the returned Future) and 2) > >> it's > >>>> impossible to do this without losing idempotency on retries. > >>>> > >>>> 2. That said, I wonder if a pluggable interface is really the right > call > >>>> here. Documenting the interactions of a producer with > >>>> a ClientExceptionHandler instance will be tricky, and implementing > them > >>>> will also be a fair amount of work. I believe that there needs to be > >> some > >>>> more granularity for how writes to non-existent topics (or really, > >>>> UNKNOWN_TOPIC_OR_PARTITION and related errors from the broker) are > >> handled, > >>>> but I'm torn between keeping it simple with maybe one or two new > >> producer > >>>> config properties, or a full-blown pluggable interface. If there are > >> more > >>>> cases that would benefit from a pluggable interface, it would be nice > to > >>>> identify these and add them to the KIP to strengthen the motivation. > >> Right > >>>> now, I'm not sure the two that are mentioned in the motivation are > >>>> sufficient. > >>>> > >>>> 3. Alternatively, a possible compromise is for this KIP to introduce > new > >>>> properties that dictate how to handle unknown-topic-partition and > >>>> record-too-large errors, with the thinking that if we introduce a > >> pluggable > >>>> interface later on, these properties will be recognized by the default > >>>> implementation of that interface but could be completely ignored or > >>>> replaced by alternative implementations. > >>>> > >>>> 4. (Nit) You can remove the "This page is meant as a template for > >> writing a > >>>> KIP..." part from the KIP. It's not a template anymore :) > >>>> > >>>> 5. If we do go the pluggable interface route, wouldn't we want to add > >> the > >>>> possibility for retry logic? The simplest version of this could be to > >> add a > >>>> RETRY value to the ClientExceptionHandlerResponse enum. > >>>> > >>>> 6. I think "SKIP" or "DROP" might be clearer instead of "CONTINUE" for > >>>> the ClientExceptionHandlerResponse enum, since they cause records to > be > >>>> dropped. > >>>> > >>>> Cheers, > >>>> > >>>> Chris > >>>> > >>>> On Wed, Apr 17, 2024 at 12:25 PM Justine Olshan > >>>> <jols...@confluent.io.invalid> wrote: > >>>> > >>>>> Hey Alieh, > >>>>> > >>>>> I echo what Omnia says, I'm not sure I understand the implications of > >> the > >>>>> change and I think more detail is needed. > >>>>> > >>>>> This comment also confused me a bit: > >>>>> * {@code ClientExceptionHandler} that continues the transaction even > >> if a > >>>>> record is too large. > >>>>> * Otherwise, it makes the transaction to fail. > >>>>> > >>>>> Relatedly, I've been working with some folks on a KIP for > transactions > >>>>> errors and how they are handled. Specifically for the > >>>>> RecordTooLargeException (and a few other errors), we want to give a > new > >>>>> error category for this error that allows the application to choose > >> how it > >>>>> is handled. Maybe this KIP is something that you are looking for? > Stay > >>>>> tuned :) > >>>>> > >>>>> Justine > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Wed, Apr 17, 2024 at 8:03 AM Omnia Ibrahim < > o.g.h.ibra...@gmail.com > >>> > >>>>> wrote: > >>>>> > >>>>>> Hi Alieh, > >>>>>> Thanks for the KIP! I have couple of comments > >>>>>> - You mentioned in the KIP motivation, > >>>>>>> Another example for which a production exception handler could be > >>>>> useful > >>>>>> is if a user tries to write into a non-existing topic, which > returns a > >>>>>> retryable error code; with infinite retries, the producer would hang > >>>>>> retrying forever. A handler could help to break the infinite retry > >> loop. > >>>>>> > >>>>>> How the handler can differentiate between something that is > temporary > >> and > >>>>>> it should keep retrying and something permanent like forgot to > create > >> the > >>>>>> topic? temporary here could be > >>>>>> the producer get deployed before the topic creation finish > (specially > >> if > >>>>>> the topic creation is handled via IaC) > >>>>>> temporary offline partitions > >>>>>> leadership changing > >>>>>> Isn’t this putting the producer at risk of dropping records > >>>>>> unintentionally? > >>>>>> - Can you elaborate more on what is written in the compatibility / > >>>>>> migration plan section please by explaining in bit more details what > >> is > >>>>> the > >>>>>> changing behaviour and how this will impact client who are > upgrading? > >>>>>> - In the proposal changes can you elaborate in the KIP where in the > >>>>>> producer lifecycle will ClientExceptionHandler and > >>>>>> TransactionExceptionHandler get triggered, and how will the producer > >>>>>> configure them to point to costumed implementation. > >>>>>> > >>>>>> Thanks > >>>>>> Omnia > >>>>>> > >>>>>>> On 17 Apr 2024, at 13:13, Alieh Saeedi > <asae...@confluent.io.INVALID > >>> > >>>>>> wrote: > >>>>>>> > >>>>>>> Hi all, > >>>>>>> Here is the KIP-1038: Add Custom Error Handler to Producer. > >>>>>>> < > >>>>>> > >>>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1038%3A+Add+Custom+Error+Handler+to+Producer > >>>>>>> > >>>>>>> I look forward to your feedback! > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Alieh > >>>>>> > >>>>>> > >>>>> > >>> > >> > > >