Hi all, Thanks for the KIP, Matthias. Yes, adding the `RETRY` option to `ProductionExceptionhandler` enables us to have a further fix in order to make KAFKA-16508 backward compatible.
@Sophie: Regarding other built-in handlers: Do you mean 1) adding `RETRY` to `DeserializationExceptionHandler` interface and implementing `LogAndRetryExceptionHandler` class subsequently, and 2) adding `RETRY` to the `ProcessingExceptionHandler` interface and implementing `LogAndRetryProcessingExceptionHandler` class subsequently? If yes, could these changes happen in a follow-up KIP? I think having `RETRY` in both above-mentioned interfaces makes sense but they are completely separated from `ProductionExceptionHandler` (? am I right?🤔) which makes it possible to vote, and close this KIP and corresponding PRs asap and go for the follow-up KIPs. Thanks, Alieh On Sat, Jul 13, 2024 at 1:54 AM Matthias J. Sax <mj...@apache.org> wrote: > After talking to a few people in-person, it seems there was concerns > that KAFKA-16508 should be considered a backward incompatible change. > > While this KIP should address these incompatibility issue, the > suggestion was to keep KAFKA-16508 in `trunk` for now, but revert it in > the 3.9 branch after the branch was cut to not ship KAFKA-16508 in 3.9. > > We aim to get both, KAFKA-16508 and this KIP into 4.0 and thus would > have a backward compatible solution. (If this KIP does not make it into > 4.0, we would revert KAFKA-16508 in 4.0 branch, too.) > > > > For backward compatibility, there is still the question about the two > built-in handlers and also custom handlers. > > Thinking about it more, I am not use if > `LogAndFailExceptionProductionHandler` would actually need a change? > With KAFKA-16508 we would just start fail for a unknown output topic > instead of doing an infinite retry, what seems safe, and a desired > change in behavior? > > For `LogAndContinueProcessingExceptionHandler` it's a little bit > different? So far, we would have an infinite retry loop for an missing > output topic. W/o a change for this handler, KAFKA-16508 would imply > that we would start to drop record on the floor for an output topic > which does not exist. -- I am frankly not sure if this change in > behavior is desirable or not (I can see it both ways: > > - The handler is designed to ignore errors and drop record on the > floor (it still does this), so maybe its good to not change the handler > (otherwise it would not "log and continue" any longer...) -- let it > return RETRY does not really sound right, given the handlers name > > - On the other hand, if somebody uses this handler in production, we > would start to drop record for a new error case, and user might not be > aware of it -- if an output topic is deleted by accident, it could > result into severe data loss using this handler is use in prod > > If we don't change the handlers impl, it might be good to check if this > handler is configured and log a WARN highlighting this change (this > might be a good compromise)? -- Given that we aim for 4.0 release, such > a change in behavior should actually be ok. > > > > For custom handlers, I think we cannot do much, but as we aim for 4.0 > release, it might be ok and we might not need to worry about it too > much. The handler will be called for a new case, with a new exception we > pass in. So it totally depends on the custom logic if the handler would > return FAIL or CONTINUE, or maybe even error out... > > > > -Matthias > > > On 7/8/24 10:22 PM, Matthias J. Sax wrote: > > Thanks for the feedback! > > > > To answer your questions: > > > > In general, yes, user topics should exist, however, we support "dynamic > > routing" allowing users to extract an output topic name dynamically from > > the message content. A poison pill record could contain an invalid topic > > a name, and thus, it think it's useful to allow users to handle this > > case gracefully, by dropping such message on the floor. > > > > I think that (not) writing into an output topic and dropping a result > > records, is of a different quality than (not) writing into a changelog > > topic (for which state store and changelog would diverge and we would > > pollute state...) -- regular repartitions topics are a little bit > > in-between but FK-join subscription/response topics are again more on > > the side of changelogs IMHO with regard of impact if we drop a write... > > > > But it was meant as a consideration to be more strict with internal > > topics. Happy to just drop the sentence from the KIP -- in the end, > > users can check the topic name which is passed into the handler and > > exclude internal topic themselves and always FAIL for them. > > > > > > > > For processing exception handler, I would rather alter KIP-1033 which is > > not shipped yet. > > > > > > > > For deserialization exceptions: while Kafka does not ship with schema > > registry, we have seen this issue in practice that RETRY could be > > useful. However, the question is if the deserializer should just retry > > internally? While I am overall not against it, it seems to be related > > but orthogonal, and I would like to keep the scope of this KIP focused. > > Maybe just file a ticket for it so we can track it. > > > > > > > > -Matthias > > > > > > > > On 7/2/24 6:50 AM, jiang dou wrote: > >> Thanks for the KIP > >> > >> I think adding ProductionExceptionHandlerResponse.RETRY is very valuable > >> > >> But I agree with Sophie Blee-Goldman: all topics in the same way. > >> > >> The caller can determine whether the topic exists and take appropriate > >> action. > >> The judgment logic of the ProductionExceptionHandler will also be > >> simpler. > >> > >> Sophie Blee-Goldman <sop...@responsive.dev> 于2024年7月2日周二 07:24写 > >> 道: > >> > >>> Thanks for the KIP -- definitely agree with this proposal, just have > >>> a few > >>> suggestions: > >>> > >>> 1. In the KIP, you mention > >>> > >>> We might also consider to not calling the handler when writing into > >>>> internal topics, as those must exist. > >>> > >>> > >>> Personally I would vote to consider all topics the same in this > >>> regard, and > >>> don't fully > >>> understand why we would treat internal topics differently. > >>> Presumably, both > >>> internal > >>> and user topics "must exist" by the time we attempt to write anything > >>> into > >>> them, no? > >>> Can you maybe clarify this part a bit further? > >>> > >>> 2. > >>> Although I understand the stated purpose of this KIP is specifically > >>> around > >>> the > >>> ProductionExceptionHandler and the case of missing topics, a RETRY > >>> option > >>> seems > >>> like it would be useful in general and could also make sense for the > new > >>> ProcessingExceptionHandler. WDYT about adding RETRY to the processing > >>> exception > >>> handler as well? > >>> I guess that also begs the question, what about the > >>> DeserializationExceptionHandler? > >>> Most serdes are probably (hopefully!) deterministic, but could > >>> deserialization fail due > >>> to network errors or other transient issues with schema registry? > >>> > >>> On Sun, Jun 30, 2024 at 12:35 PM Matthias J. Sax <mj...@apache.org> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> as a follow up to https://issues.apache.org/jira/browse/KAFKA-16508 > >>>> which is related to KIP-1038, I would like to prose adding a RETRY > >>>> option to production error handler responses: > >>>> > >>>> > >>> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627309 > >>>> > >>>> Looking forward to your feedback. > >>>> > >>>> > >>>> -Matthias > >>>> > >>> > >> >