Just catching up on this KIP. One tiny comment: I would prefer to remove the "Always" from the handler implementation names -- it sounds "cleaner" to me without it.
-Matthias On 11/5/17 12:57 PM, Matt Farmer wrote: > It is agreed, then. I've updated the pull request. I'm trying to also > update the KIP accordingly, but cwiki is being slow and dropping > connections..... I'll try again a bit later but please consider the KIP > updated for all intents and purposes. heh. > > On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wangg...@gmail.com> wrote: > >> That makes sense. >> >> >> Guozhang >> >> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <m...@frmr.me> wrote: >> >>> Interesting. I'm not sure I agree. I've been bitten many times by >>> unintentionally shipping code that fails to properly implement logging. I >>> always discover this at the exact *worst* moment, too. (Normally at 3 AM >>> during an on-call shift. Hah.) However, if others feel the same way I >> could >>> probably be convinced to remove it. >>> >>> We could also meet halfway and say that when a customized >>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG >>> level instead of WARN level. Would that alternative be appealing to you? >>> >>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wangg...@gmail.com> >> wrote: >>> >>>> Thanks for the updates. I made a pass over the wiki again and it looks >>>> good. >>>> >>>> About whether record collector should still internally log the error in >>>> addition to what the customized ProductionExceptionHandler does. I >>>> personally would prefer only to log if the returned value is FAIL to >>>> indicate that this thread is going to shutdown and trigger the >> exception >>>> handler. >>>> >>>> >>>> Guozhang >>>> >>>> >>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <m...@frmr.me> wrote: >>>> >>>>> Hello, a bit later than I'd anticipated, but I've updated this KIP as >>>>> outlined above. The updated KIP is now ready for review again! >>>>> >>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <m...@frmr.me> wrote: >>>>> >>>>>> Ah. I actually created both of those in the PR and forgot to >> mention >>>> them >>>>>> by name in the KIP! Thanks for pointing out the oversight. >>>>>> >>>>>> I’ll revise the KIP this afternoon accordingly. >>>>>> >>>>>> The logging is actually provided for in the record collector. >>> Whenever >>>> a >>>>>> handler continues it’ll log a warning to ensure that it’s >>> *impossible* >>>> to >>>>>> write a handler that totally suppresses production exceptions from >>> the >>>>> log. >>>>>> As such, the default continue handler just returns the continue >>> value. >>>> I >>>>>> can add details about those semantics to the KIP as well. >>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax < >>> matth...@confluent.io >>>>> >>>>>> wrote: >>>>>> >>>>>>> One more comment. >>>>>>> >>>>>>> You mention a default implementation for the handler that fails. I >>>>>>> think, this should be part of the public API and thus should have >> a >>>>>>> proper defined named that is mentioned in the KIP. >>>>>>> >>>>>>> We could also add a second implementation for the log-and-move-on >>>>>>> strategy, as both are the two most common cases. This class should >>>> also >>>>>>> be part of public API (so users can just set in the config) with a >>>>>>> proper name. >>>>>>> >>>>>>> >>>>>>> Otherwise, I like the KIP a lot! Thanks. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote: >>>>>>>> Thanks for the heads up. Yes, I think my changes are compatible >>> with >>>>>>> that >>>>>>>> PR, but there will be a merge conflict that happens whenever one >>> of >>>>> the >>>>>>> PRs >>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148 goes >> in >>>>>>> first. :) >>>>>>>> >>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang < >> wangg...@gmail.com >>>> >>>>>>> wrote: >>>>>>>> >>>>>>>>> That sounds reasonable, thanks Matt. >>>>>>>>> >>>>>>>>> As for the implementation, please note that there is another >>>> ongoing >>>>> PR >>>>>>>>> that may touch the same classes that you are working on: >>>>>>>>> https://github.com/apache/kafka/pull/4148 >>>>>>>>> >>>>>>>>> So it may help if you can also take a look at that PR and see >> if >>> it >>>>> is >>>>>>>>> compatible with your changes. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Guozhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <m...@frmr.me> >>>> wrote: >>>>>>>>> >>>>>>>>>> I've opened this pull request to implement the KIP as >> currently >>>>>>> written: >>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs >> some >>>>> tests >>>>>>>>>> added, >>>>>>>>>> but largely represents the shape I was going for. >>>>>>>>>> >>>>>>>>>> If there are more points that folks would like to discuss, >>> please >>>>> let >>>>>>> me >>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll >>> probably >>>>>>> start >>>>>>>>> a >>>>>>>>>> [VOTE] thread. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> >>> wrote: >>>>>>>>>> >>>>>>>>>>> I can’t think of a reason that would be problematic. >>>>>>>>>>> >>>>>>>>>>> Most of the time I would write a handler like this, I either >>> want >>>>> to >>>>>>>>>>> ignore the error or fail and bring everything down so that I >>> can >>>>> spin >>>>>>>>> it >>>>>>>>>>> back up later and resume from earlier offsets. When we start >> up >>>>> after >>>>>>>>>>> crashing we’ll eventually try to process the message we >> failed >>> to >>>>>>>>> produce >>>>>>>>>>> again. >>>>>>>>>>> >>>>>>>>>>> I’m concerned that “putting in a queue for later” opens you >> up >>> to >>>>>>>>> putting >>>>>>>>>>> messages into the destination topic in an unexpected order. >>>> However >>>>>>> if >>>>>>>>>>> others feel differently, I’m happy to talk about it. >>>>>>>>>>> >>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang < >>>> wangg...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is >> that >>>> the >>>>>>>>>> record >>>>>>>>>>>>> metadata is always null if an exception occurred while >> trying >>>> to >>>>>>>>>>>> produce. >>>>>>>>>>>> >>>>>>>>>>>> That is right. Thanks. >>>>>>>>>>>> >>>>>>>>>>>> I looked at the example code, and one thing I realized that >>>> since >>>>> we >>>>>>>>> are >>>>>>>>>>>> not passing the context in the handle function, we may not >> be >>>>>>>>> implement >>>>>>>>>>>> the >>>>>>>>>>>> logic to send the fail records into another queue for future >>>>>>>>> processing. >>>>>>>>>>>> Would people think that would be a big issue? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Guozhang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <m...@frmr.me >>> >>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello all, >>>>>>>>>>>>> >>>>>>>>>>>>> I've updated the KIP based on this conversation, and made >> it >>> so >>>>>>> that >>>>>>>>>> its >>>>>>>>>>>>> interface, config setting, and parameters line up more >>> closely >>>>> with >>>>>>>>>> the >>>>>>>>>>>>> interface in KIP-161 (deserialization handler). >>>>>>>>>>>>> >>>>>>>>>>>>> I believe there are a few specific questions I need to >> reply >>>>>>> to..... >>>>>>>>>>>>> >>>>>>>>>>>>>> The question I had about then handle parameters are around >>> the >>>>>>>>>> record, >>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be >>> generics >>>> of >>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or >>>> `ProducerRecord<? >>>>>>>>>>>> extends >>>>>>>>>>>>>> Object, ? extends Object>`? >>>>>>>>>>>>> >>>>>>>>>>>>> At this point in the code we're guaranteed that this is a >>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just >>> make >>>>> it >>>>>>>>>>>> harder >>>>>>>>>>>>> to work with the key and value. >>>>>>>>>>>>> >>>>>>>>>>>>>> Also, should the handle function include the >>> `RecordMetadata` >>>> as >>>>>>>>>> well >>>>>>>>>>>> in >>>>>>>>>>>>>> case it is not null? >>>>>>>>>>>>> >>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is >> that >>>> the >>>>>>>>>> record >>>>>>>>>>>>> metadata is always null if an exception occurred while >> trying >>>> to >>>>>>>>>>>> produce. >>>>>>>>>>>>> >>>>>>>>>>>>>> We may probably try to write down at least the following >>>>> handling >>>>>>>>>>>> logic >>>>>>>>>>>>> and >>>>>>>>>>>>>> see if the given API is sufficient for it >>>>>>>>>>>>> >>>>>>>>>>>>> I've added some examples to the KIP. Let me know what you >>>> think. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Matt >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <m...@frmr.me> >>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now >> and >>> am >>>>>>>>>>>> planning >>>>>>>>>>>>> on >>>>>>>>>>>>>> updating the KIP again with details from this conversation >>>> later >>>>>>>>>> this >>>>>>>>>>>>> week. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :) >>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang < >>>>> wangg...@gmail.com >>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the KIP Matt. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regarding the handle interface of >>>>> ProductionExceptionHandlerResp >>>>>>>>>> onse, >>>>>>>>>>>>>>> could >>>>>>>>>>>>>>> you write it on the wiki also, along with the actual >> added >>>>> config >>>>>>>>>>>> names >>>>>>>>>>>>>>> (e.g. what >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>> 161%3A+streams+ >>>>>>>>>>>>> deserialization+exception+handlers >>>>>>>>>>>>>>> described). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The question I had about then handle parameters are >> around >>>> the >>>>>>>>>>>> record, >>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be >>> generics >>>>> of >>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or >>>> `ProducerRecord<? >>>>>>>>>>>> extends >>>>>>>>>>>>>>> Object, ? extends Object>`? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, should the handle function include the >>> `RecordMetadata` >>>>> as >>>>>>>>>>>> well in >>>>>>>>>>>>>>> case it is not null? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We may probably try to write down at least the following >>>>> handling >>>>>>>>>>>> logic >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw >>> exception >>>>>>>>>>>>> immediately >>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and >> drop >>>>> record >>>>>>>>>> and >>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific >> "error" >>>>> Kafka >>>>>>>>>>>> topic, >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>> record it as an app-level metrics ( >>>>>>>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_ >>>>> monitoring >>>>>>>>> ) >>>>>>>>>>>> for >>>>>>>>>>>>>>> monitoring. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Guozhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer < >> m...@frmr.me >>>> >>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did some more digging tonight. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses >>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the >>> config >>>>>>>>>> name. No >>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this should >> use >>>>>>>>>>>>>>>> "default.production.exception.handler". >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer < >> m...@frmr.me >>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think getting access to the serialized record is >>>> possible, >>>>>>>>>> and >>>>>>>>>>>>>>> changing >>>>>>>>>>>>>>>>> the naming and return type is certainly doable. >> However, >>>>>>>>>> because >>>>>>>>>>>>> we're >>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no >>> guarantee >>>>>>>>>> that >>>>>>>>>>>> the >>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time this >>>>>>>>>> particular >>>>>>>>>>>>>>> handler >>>>>>>>>>>>>>>>> runs. So I think the signature would change to >> something >>>>>>>>> like: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final >>>>>>>>>>>> ProducerRecord<..> >>>>>>>>>>>>>>>> record, >>>>>>>>>>>>>>>>> final Exception exception) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Would this be acceptable? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer < >>> m...@frmr.me> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and >> return >>>> type >>>>>>>>>> but >>>>>>>>>>>>> I’m >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the >> record >>>>>>>>>> itself >>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>> other changes. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow. >>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax < >>>>>>>>>>>>>>> matth...@confluent.io> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the KIP. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>> 161%3A+streams+ >>>>>>>>>>>>>>>> deserialization+exception+handlers >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter >> naming, >>>>>>>>>> return >>>>>>>>>>>>>>> types, >>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a >> unified >>>>>>>>> user >>>>>>>>>>>>>>>> experience. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote: >>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update the >>> KIP >>>>>>>>>>>>>>> accordingly. >>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu < >>>>>>>>>> yuzhih...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it >>>>>>>>>>>>>>>>>>>>> production.exception.processor.class >>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be >>>>>>>>>> specified. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer < >>>>>>>>>> m...@frmr.me> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I >>> just >>>>>>>>>> filed >>>>>>>>>>>>>>> here: >>>>>>>>>>>>>>>>>>>>>> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+ >>> error+handling++when+Kafka+ >>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from >> folks >>>>>>>>> about >>>>>>>>>>>> this >>>>>>>>>>>>>>> idea >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute back. >> :) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> Matt Farmer >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> -- Guozhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> -- Guozhang >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- Guozhang >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>>> >>> >> >> >> >> -- >> -- Guozhang >> >
signature.asc
Description: OpenPGP digital signature