Hi all,

Thanks for the KIP, Matthias. Yes, adding the `RETRY` option to
`ProductionExceptionhandler` enables us to have a further fix in order to
make KAFKA-16508 backward compatible.


@Sophie:
Regarding other built-in handlers: Do you mean 1) adding `RETRY` to
`DeserializationExceptionHandler`
interface and implementing `LogAndRetryExceptionHandler` class
subsequently, and 2) adding `RETRY` to the `ProcessingExceptionHandler`
interface and implementing `LogAndRetryProcessingExceptionHandler` class
subsequently? If yes, could these changes happen in a follow-up KIP?

I think having `RETRY` in both above-mentioned interfaces makes sense but
they are completely separated from `ProductionExceptionHandler` (? am I
right?🤔) which makes it possible to vote, and close this KIP and
corresponding PRs asap and go for the follow-up KIPs.

Thanks,

Alieh

On Sat, Jul 13, 2024 at 1:54 AM Matthias J. Sax <mj...@apache.org> wrote:

> After talking to a few people in-person, it seems there was concerns
> that KAFKA-16508 should be considered a backward incompatible change.
>
> While this KIP should address these incompatibility issue, the
> suggestion was to keep KAFKA-16508 in `trunk` for now, but revert it in
> the 3.9 branch after the branch was cut to not ship KAFKA-16508 in 3.9.
>
> We aim to get both, KAFKA-16508 and this KIP into 4.0 and thus would
> have a backward compatible solution. (If this KIP does not make it into
> 4.0, we would revert KAFKA-16508 in 4.0 branch, too.)
>
>
>
> For backward compatibility, there is still the question about the two
> built-in handlers and also custom handlers.
>
> Thinking about it more, I am not use if
> `LogAndFailExceptionProductionHandler` would actually need a change?
> With KAFKA-16508 we would just start fail for a unknown output topic
> instead of doing an infinite retry, what seems safe, and a desired
> change in behavior?
>
> For `LogAndContinueProcessingExceptionHandler` it's a little bit
> different? So far, we would have an infinite retry loop for an missing
> output topic. W/o a change for this handler, KAFKA-16508 would imply
> that we would start to drop record on the floor for an output topic
> which does not exist. -- I am frankly not sure if this change in
> behavior is desirable or not (I can see it both ways:
>
>   - The handler is designed to ignore errors and drop record on the
> floor (it still does this), so maybe its good to not change the handler
> (otherwise it would not "log and continue" any longer...) -- let it
> return RETRY does not really sound right, given the handlers name
>
>   - On the other hand, if somebody uses this handler in production, we
> would start to drop record for a new error case, and user might not be
> aware of it -- if an output topic is deleted by accident, it could
> result into severe data loss using this handler is use in prod
>
> If we don't change the handlers impl, it might be good to check if this
> handler is configured and log a WARN highlighting this change (this
> might be a good compromise)? -- Given that we aim for 4.0 release, such
> a change in behavior should actually be ok.
>
>
>
> For custom handlers, I think we cannot do much, but as we aim for 4.0
> release, it might be ok and we might not need to worry about it too
> much. The handler will be called for a new case, with a new exception we
> pass in. So it totally depends on the custom logic if the handler would
> return FAIL or CONTINUE, or maybe even error out...
>
>
>
> -Matthias
>
>
> On 7/8/24 10:22 PM, Matthias J. Sax wrote:
> > Thanks for the feedback!
> >
> > To answer your questions:
> >
> > In general, yes, user topics should exist, however, we support "dynamic
> > routing" allowing users to extract an output topic name dynamically from
> > the message content. A poison pill record could contain an invalid topic
> > a name, and thus, it think it's useful to allow users to handle this
> > case gracefully, by dropping such message on the floor.
> >
> > I think that (not) writing into an output topic and dropping a result
> > records, is of a different quality than (not) writing into a changelog
> > topic (for which state store and changelog would diverge and we would
> > pollute state...) -- regular repartitions topics are a little bit
> > in-between but FK-join subscription/response topics are again more on
> > the side of changelogs IMHO with regard of impact if we drop a write...
> >
> > But it was meant as a consideration to be more strict with internal
> > topics. Happy to just drop the sentence from the KIP -- in the end,
> > users can check the topic name which is passed into the handler and
> > exclude internal topic themselves and always FAIL for them.
> >
> >
> >
> > For processing exception handler, I would rather alter KIP-1033 which is
> > not shipped yet.
> >
> >
> >
> > For deserialization exceptions: while Kafka does not ship with schema
> > registry, we have seen this issue in practice that RETRY could be
> > useful. However, the question is if the deserializer should just retry
> > internally? While I am overall not against it, it seems to be related
> > but orthogonal, and I would like to keep the scope of this KIP focused.
> > Maybe just file a ticket for it so we can track it.
> >
> >
> >
> > -Matthias
> >
> >
> >
> > On 7/2/24 6:50 AM, jiang dou wrote:
> >> Thanks for the KIP
> >>
> >> I think adding ProductionExceptionHandlerResponse.RETRY is very valuable
> >>
> >> But I agree with Sophie Blee-Goldman: all topics in the same way.
> >>
> >> The caller can determine whether the topic exists and take appropriate
> >> action.
> >> The judgment logic of the ProductionExceptionHandler will also be
> >> simpler.
> >>
> >> Sophie Blee-Goldman <sop...@responsive.dev> 于2024年7月2日周二 07:24写
> >> 道:
> >>
> >>> Thanks for the KIP -- definitely agree with this proposal, just have
> >>> a few
> >>> suggestions:
> >>>
> >>> 1. In the KIP, you mention
> >>>
> >>> We might also consider to not calling the handler when writing into
> >>>> internal topics, as those must exist.
> >>>
> >>>
> >>> Personally I would vote to consider all topics the same in this
> >>> regard, and
> >>> don't fully
> >>> understand why we would treat internal topics differently.
> >>> Presumably, both
> >>> internal
> >>> and user topics "must exist" by the time we attempt to write anything
> >>> into
> >>> them, no?
> >>> Can you maybe clarify this part a bit further?
> >>>
> >>> 2.
> >>> Although I understand the stated purpose of this KIP is specifically
> >>> around
> >>> the
> >>> ProductionExceptionHandler and the case of missing topics, a RETRY
> >>> option
> >>> seems
> >>> like it would be useful in general and could also make sense for the
> new
> >>> ProcessingExceptionHandler. WDYT about adding RETRY to the processing
> >>> exception
> >>> handler as well?
> >>> I guess that also begs the question, what about the
> >>> DeserializationExceptionHandler?
> >>> Most serdes are probably (hopefully!) deterministic, but could
> >>> deserialization fail due
> >>> to network errors or other transient issues with schema registry?
> >>>
> >>> On Sun, Jun 30, 2024 at 12:35 PM Matthias J. Sax <mj...@apache.org>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> as a follow up to https://issues.apache.org/jira/browse/KAFKA-16508
> >>>> which is related to KIP-1038, I would like to prose adding a RETRY
> >>>> option to production error handler responses:
> >>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627309
> >>>>
> >>>> Looking forward to your feedback.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>
> >>
>

Reply via email to