Re: [DISCUSS] KIP-1065: Add "retry" return-option to ProductionExceptionHandler

Matthias J. Sax Tue, 10 Sep 2024 19:58:54 -0700

I was working on the PR for this KIP and realized that there is only oneimplementation of the handler, namely`DefaultProductionExceptionHandler` which does return FAIL right now.


Thus, we only need to update this handler. I updated the KIP accordingly.



-Matthias

On 8/22/24 11:27, Matthias J. Sax wrote:

Thanks Alieh.

I agree to your assessment, that adding a RETRY to the existing handlermight be useful, but should be done via separate KIPs, if there is userdemand for a feature like this.

Maybe Sophie can file one (or two separate) JIRA for the existinghandlers, to see if there is any interest for such a feature.



I'll start a VOTE.


-Matthias

On 7/22/24 9:31 AM, Alieh Saeedi wrote:

Hi all,

Thanks for the KIP, Matthias. Yes, adding the `RETRY` option to
`ProductionExceptionhandler` enables us to have a further fix in order to
make KAFKA-16508 backward compatible.


@Sophie:
Regarding other built-in handlers: Do you mean 1) adding `RETRY` to
`DeserializationExceptionHandler`
interface and implementing `LogAndRetryExceptionHandler` class
subsequently, and 2) adding `RETRY` to the `ProcessingExceptionHandler`
interface and implementing `LogAndRetryProcessingExceptionHandler` class
subsequently? If yes, could these changes happen in a follow-up KIP?

I think having `RETRY` in both above-mentioned interfaces makes sense but
they are completely separated from `ProductionExceptionHandler` (? am I
right?🤔) which makes it possible to vote, and close this KIP and
corresponding PRs asap and go for the follow-up KIPs.

Thanks,

Alieh

On Sat, Jul 13, 2024 at 1:54 AM Matthias J. Sax <mj...@apache.org> wrote:

After talking to a few people in-person, it seems there was concerns
that KAFKA-16508 should be considered a backward incompatible change.

While this KIP should address these incompatibility issue, the
suggestion was to keep KAFKA-16508 in `trunk` for now, but revert it in
the 3.9 branch after the branch was cut to not ship KAFKA-16508 in 3.9.

We aim to get both, KAFKA-16508 and this KIP into 4.0 and thus would
have a backward compatible solution. (If this KIP does not make it into
4.0, we would revert KAFKA-16508 in 4.0 branch, too.)



For backward compatibility, there is still the question about the two
built-in handlers and also custom handlers.

Thinking about it more, I am not use if
`LogAndFailExceptionProductionHandler` would actually need a change?
With KAFKA-16508 we would just start fail for a unknown output topic
instead of doing an infinite retry, what seems safe, and a desired
change in behavior?

For `LogAndContinueProcessingExceptionHandler` it's a little bit
different? So far, we would have an infinite retry loop for an missing
output topic. W/o a change for this handler, KAFKA-16508 would imply
that we would start to drop record on the floor for an output topic
which does not exist. -- I am frankly not sure if this change in
behavior is desirable or not (I can see it both ways:

   - The handler is designed to ignore errors and drop record on the
floor (it still does this), so maybe its good to not change the handler
(otherwise it would not "log and continue" any longer...) -- let it
return RETRY does not really sound right, given the handlers name

   - On the other hand, if somebody uses this handler in production, we
would start to drop record for a new error case, and user might not be
aware of it -- if an output topic is deleted by accident, it could
result into severe data loss using this handler is use in prod

If we don't change the handlers impl, it might be good to check if this
handler is configured and log a WARN highlighting this change (this
might be a good compromise)? -- Given that we aim for 4.0 release, such
a change in behavior should actually be ok.



For custom handlers, I think we cannot do much, but as we aim for 4.0
release, it might be ok and we might not need to worry about it too
much. The handler will be called for a new case, with a new exception we
pass in. So it totally depends on the custom logic if the handler would
return FAIL or CONTINUE, or maybe even error out...



-Matthias


On 7/8/24 10:22 PM, Matthias J. Sax wrote:

Thanks for the feedback!

To answer your questions:

In general, yes, user topics should exist, however, we support "dynamic

routing" allowing users to extract an output topic name dynamicallyfromthe message content. A poison pill record could contain an invalidtopic

a name, and thus, it think it's useful to allow users to handle this
case gracefully, by dropping such message on the floor.

I think that (not) writing into an output topic and dropping a result
records, is of a different quality than (not) writing into a changelog
topic (for which state store and changelog would diverge and we would
pollute state...) -- regular repartitions topics are a little bit
in-between but FK-join subscription/response topics are again more on
the side of changelogs IMHO with regard of impact if we drop a write...

But it was meant as a consideration to be more strict with internal
topics. Happy to just drop the sentence from the KIP -- in the end,
users can check the topic name which is passed into the handler and
exclude internal topic themselves and always FAIL for them.

For processing exception handler, I would rather alter KIP-1033which is

not shipped yet.



For deserialization exceptions: while Kafka does not ship with schema
registry, we have seen this issue in practice that RETRY could be
useful. However, the question is if the deserializer should just retry
internally? While I am overall not against it, it seems to be related
but orthogonal, and I would like to keep the scope of this KIP focused.
Maybe just file a ticket for it so we can track it.



-Matthias



On 7/2/24 6:50 AM, jiang dou wrote:

Thanks for the KIP

I think adding ProductionExceptionHandlerResponse.RETRY is veryvaluable


But I agree with Sophie Blee-Goldman: all topics in the same way.

The caller can determine whether the topic exists and take appropriate
action.
The judgment logic of the ProductionExceptionHandler will also be
simpler.

Sophie Blee-Goldman <sop...@responsive.dev> 于2024年7月2日周二 07:24写
道：

Thanks for the KIP -- definitely agree with this proposal, just have
a few
suggestions:

1. In the KIP, you mention

We might also consider to not calling the handler when writing into

internal topics, as those must exist.



Personally I would vote to consider all topics the same in this
regard, and
don't fully
understand why we would treat internal topics differently.
Presumably, both
internal
and user topics "must exist" by the time we attempt to write anything
into
them, no?
Can you maybe clarify this part a bit further?

2.
Although I understand the stated purpose of this KIP is specifically
around
the
ProductionExceptionHandler and the case of missing topics, a RETRY
option
seems
like it would be useful in general and could also make sense for the

new

ProcessingExceptionHandler. WDYT about adding RETRY to the processing
exception
handler as well?
I guess that also begs the question, what about the
DeserializationExceptionHandler?
Most serdes are probably (hopefully!) deterministic, but could
deserialization fail due
to network errors or other transient issues with schema registry?

On Sun, Jun 30, 2024 at 12:35 PM Matthias J. Sax <mj...@apache.org>
wrote:

Hi,

as a follow up to https://issues.apache.org/jira/browse/KAFKA-16508
which is related to KIP-1038, I would like to prose adding a RETRY
option to production error handler responses:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627309


Looking forward to your feedback.


-Matthias

Re: [DISCUSS] KIP-1065: Add "retry" return-option to ProductionExceptionHandler

Reply via email to