Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

Alieh Saeedi Tue, 23 Apr 2024 02:23:12 -0700

Thanks Matthias. I changed it to `custom.exception.handler`

Alieh



On Tue, Apr 23, 2024 at 8:47 AM Matthias J. Sax <mj...@apache.org> wrote:

> Thanks Alieh!
>
> A few nits:
>
>
> 1) The new config we add for the producer should be mentioned in the
> "Public Interfaces" section.
>
> 2) Why do we use `producer.` prefix for a *producer* config? Should it
> be `exception.handler` only?
>
>
> -Matthias
>
> On 4/22/24 7:38 AM, Alieh Saeedi wrote:
> > Thank you all for the feedback!
> >
> > Addressing the main concern: The KIP is about giving the user the ability
> > to handle producer exceptions, but to be more conservative and avoid
> future
> > issues, we decided to be limited to a short list of exceptions. I
> included
> > *RecordTooLargeExceptin* and *UnknownTopicOrPartitionException. *Open to
> > suggestion for adding some more ;-)
> >
> > KIP Updates:
> > - clarified the way that the user should configure the Producer to use
> the
> > custom handler. I think adding a producer config property is the cleanest
> > one.
> > - changed the *ClientExceptionHandler* to *ProducerExceptionHandler* to
> be
> > closer to what we are changing.
> > - added the ProducerRecord as the input parameter of the handle() method
> as
> > well.
> > - increased the response types to 3 to have fail and two types of
> continue.
> > - The default behaviour is having no custom handler, having the
> > corresponding config parameter set to null. Therefore, the KIP provides
> no
> > default implementation of the interface.
> > - We follow the interface solution as described in the
> > Rejected Alternetives section.
> >
> >
> > Cheers,
> > Alieh
> >
> >
> > On Thu, Apr 18, 2024 at 8:11 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> >> Thanks for the KIP Alieh! It addresses an important case for error
> >> handling.
> >>
> >> I agree that using this handler would be an expert API, as mentioned by
> >> a few people. But I don't think it would be a reason to not add it. It's
> >> always a tricky tradeoff what to expose to users and to avoid foot guns,
> >> but we added similar handlers to Kafka Streams, and have good experience
> >> with it. Hence, I understand, but don't share the concern raised.
> >>
> >> I also agree that there is some responsibility by the user to understand
> >> how such a handler should be implemented to not drop data by accident.
> >> But it seem unavoidable and acceptable.
> >>
> >> While I understand that a "simpler / reduced" API (eg via configs) might
> >> also work, I personally prefer a full handler. Configs have the same
> >> issue that they could be miss-used potentially leading to incorrectly
> >> dropped data, but at the same time are less flexible (and thus maybe
> >> ever harder to use correctly...?). Base on my experience, there is also
> >> often weird corner case for which it make sense to also drop records for
> >> other exceptions, and a full handler has the advantage of full
> >> flexibility and "absolute power!".
> >>
> >> To be fair: I don't know the exact code paths of the producer in
> >> details, so please keep me honest. But my understanding is, that the KIP
> >> aims to allow users to react to internal exception, and decide to keep
> >> retrying internally, swallow the error and drop the record, or raise the
> >> error?
> >>
> >> Maybe the KIP would need to be a little bit more precises what error we
> >> want to cover -- I don't think this list must be exhaustive, as we can
> >> always do follow up KIP to also apply the handler to other errors to
> >> expand the scope of the handler. The KIP does mention examples, but it
> >> might be good to explicitly state for what cases the handler gets
> applied?
> >>
> >> I am also not sure if CONTINUE and FAIL are enough options? Don't we
> >> need three options? Or would `CONTINUE` have different meaning depending
> >> on the type of error? Ie, for a retryable error `CONTINUE` would mean
> >> keep retrying internally, but for a non-retryable error `CONTINUE` means
> >> swallow the error and drop the record? This semantic overload seems
> >> tricky to reason about by users, so it might better to split `CONTINUE`
> >> into two cases -> `RETRY` and `SWALLOW` (or some better names).
> >>
> >> Additionally, should we just ship a `DefaultClientExceptionHandler`
> >> which would return `FAIL`, for backward compatibility. Or don't have any
> >> default handler to begin with and allow it to be `null`? I don't see the
> >> need for a specific `TransactionExceptionHandler`. To me, the goal
> >> should be to not modify the default behavior at all, but to just allow
> >> users to change the default behavior if there is a need.
> >>
> >> What is missing on the KIP though it, how the handler is passed into the
> >> producer thought? Would we need a new config which allows to set a
> >> custom handler? And/or would we allow to pass in an instance via the
> >> constructor or add a new method to set a handler?
> >>
> >>
> >> -Matthias
> >>
> >> On 4/18/24 10:02 AM, Andrew Schofield wrote:
> >>> Hi Alieh,
> >>> Thanks for the KIP.
> >>>
> >>> Exception handling in the Kafka producer and consumer is really not
> >> ideal.
> >>> It’s even harder working out what’s going on with the consumer.
> >>>
> >>> I’m a bit nervous about this KIP and I agree with Chris that it could
> do
> >> with additional
> >>> motivation. This would be an expert-level interface given how
> complicated
> >>> the exception handling for Kafka has become.
> >>>
> >>> 7. The application is not really aware of the batching being done on
> its
> >> behalf.
> >>> The ProduceResponse can actually return an array of records which
> failed
> >>> per batch. If you get RecordTooLargeException, and want to retry, you
> >> probably
> >>> need to remove the offending records from the batch and retry it. This
> >> is getting fiddly.
> >>>
> >>> 8. There is already o.a.k.clients.producer.Callback. I wonder whether
> an
> >>> alternative might be to add a method to the existing Callback
> interface,
> >> such as:
> >>>
> >>>     ClientExceptionResponse onException(Exception exception)
> >>>
> >>> It would be called when a ProduceResponse contains an error, but the
> >>> producer is going to retry. It tells the producer whether to go ahead
> >> with the retry
> >>> or not. The default implementation would be to CONTINUE, because that’s
> >>> just continuing to retry as planned. Note that this is a per-record
> >> callback, so
> >>> the application would be able to understand which records failed.
> >>>
> >>> By using an existing interface, we already know how to configure it and
> >> we know
> >>> about the threading model for calling it.
> >>>
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>>
> >>>
> >>>> On 17 Apr 2024, at 18:17, Chris Egerton <chr...@aiven.io.INVALID>
> >> wrote:
> >>>>
> >>>> Hi Alieh,
> >>>>
> >>>> Thanks for the KIP! The issue with writing to non-existent topics is
> >>>> particularly frustrating for users of Kafka Connect and has been the
> >> source
> >>>> of a handful of Jira tickets over the years. My thoughts:
> >>>>
> >>>> 1. An additional detail we can add to the motivation (or possibly
> >> rejected
> >>>> alternatives) section is that this kind of custom retry logic can't be
> >>>> implemented by hand by, e.g., setting retries to 0 in the producer
> >> config
> >>>> and handling exceptions at the application level. Or rather, it can,
> >> but 1)
> >>>> it's a bit painful to have to reimplement at every call-site for
> >>>> Producer::send (and any code that awaits the returned Future) and 2)
> >> it's
> >>>> impossible to do this without losing idempotency on retries.
> >>>>
> >>>> 2. That said, I wonder if a pluggable interface is really the right
> call
> >>>> here. Documenting the interactions of a producer with
> >>>> a ClientExceptionHandler instance will be tricky, and implementing
> them
> >>>> will also be a fair amount of work. I believe that there needs to be
> >> some
> >>>> more granularity for how writes to non-existent topics (or really,
> >>>> UNKNOWN_TOPIC_OR_PARTITION and related errors from the broker) are
> >> handled,
> >>>> but I'm torn between keeping it simple with maybe one or two new
> >> producer
> >>>> config properties, or a full-blown pluggable interface. If there are
> >> more
> >>>> cases that would benefit from a pluggable interface, it would be nice
> to
> >>>> identify these and add them to the KIP to strengthen the motivation.
> >> Right
> >>>> now, I'm not sure the two that are mentioned in the motivation are
> >>>> sufficient.
> >>>>
> >>>> 3. Alternatively, a possible compromise is for this KIP to introduce
> new
> >>>> properties that dictate how to handle unknown-topic-partition and
> >>>> record-too-large errors, with the thinking that if we introduce a
> >> pluggable
> >>>> interface later on, these properties will be recognized by the default
> >>>> implementation of that interface but could be completely ignored or
> >>>> replaced by alternative implementations.
> >>>>
> >>>> 4. (Nit) You can remove the "This page is meant as a template for
> >> writing a
> >>>> KIP..." part from the KIP. It's not a template anymore :)
> >>>>
> >>>> 5. If we do go the pluggable interface route, wouldn't we want to add
> >> the
> >>>> possibility for retry logic? The simplest version of this could be to
> >> add a
> >>>> RETRY value to the ClientExceptionHandlerResponse enum.
> >>>>
> >>>> 6. I think "SKIP" or "DROP" might be clearer instead of "CONTINUE" for
> >>>> the ClientExceptionHandlerResponse enum, since they cause records to
> be
> >>>> dropped.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Chris
> >>>>
> >>>> On Wed, Apr 17, 2024 at 12:25 PM Justine Olshan
> >>>> <jols...@confluent.io.invalid> wrote:
> >>>>
> >>>>> Hey Alieh,
> >>>>>
> >>>>> I echo what Omnia says, I'm not sure I understand the implications of
> >> the
> >>>>> change and I think more detail is needed.
> >>>>>
> >>>>> This comment also confused me a bit:
> >>>>> * {@code ClientExceptionHandler} that continues the transaction even
> >> if a
> >>>>> record is too large.
> >>>>> * Otherwise, it makes the transaction to fail.
> >>>>>
> >>>>> Relatedly, I've been working with some folks on a KIP for
> transactions
> >>>>> errors and how they are handled. Specifically for the
> >>>>> RecordTooLargeException (and a few other errors), we want to give a
> new
> >>>>> error category for this error that allows the application to choose
> >> how it
> >>>>> is handled. Maybe this KIP is something that you are looking for?
> Stay
> >>>>> tuned :)
> >>>>>
> >>>>> Justine
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 17, 2024 at 8:03 AM Omnia Ibrahim <
> o.g.h.ibra...@gmail.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Alieh,
> >>>>>> Thanks for the KIP! I have couple of comments
> >>>>>> - You mentioned in the KIP motivation,
> >>>>>>> Another example for which a production exception handler could be
> >>>>> useful
> >>>>>> is if a user tries to write into a non-existing topic, which
> returns a
> >>>>>> retryable error code; with infinite retries, the producer would hang
> >>>>>> retrying forever. A handler could help to break the infinite retry
> >> loop.
> >>>>>>
> >>>>>> How the handler can differentiate between something that is
> temporary
> >> and
> >>>>>> it should keep retrying and something permanent like forgot to
> create
> >> the
> >>>>>> topic? temporary here could be
> >>>>>> the producer get deployed before the topic creation finish
> (specially
> >> if
> >>>>>> the topic creation is handled via IaC)
> >>>>>> temporary offline partitions
> >>>>>> leadership changing
> >>>>>>          Isn’t this putting the producer at risk of dropping records
> >>>>>> unintentionally?
> >>>>>> - Can you elaborate more on what is written in the compatibility /
> >>>>>> migration plan section please by explaining in bit more details what
> >> is
> >>>>> the
> >>>>>> changing behaviour and how this will impact client who are
> upgrading?
> >>>>>> - In the proposal changes can you elaborate in the KIP where in the
> >>>>>> producer lifecycle will ClientExceptionHandler and
> >>>>>> TransactionExceptionHandler get triggered, and how will the producer
> >>>>>> configure them to point to costumed implementation.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Omnia
> >>>>>>
> >>>>>>> On 17 Apr 2024, at 13:13, Alieh Saeedi
> <asae...@confluent.io.INVALID
> >>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>> Here is the KIP-1038: Add Custom Error Handler to Producer.
> >>>>>>> <
> >>>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1038%3A+Add+Custom+Error+Handler+to+Producer
> >>>>>>>
> >>>>>>> I look forward to your feedback!
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Alieh
> >>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

Reply via email to