Hi all, sorry to do this again, but during review of the code to add the metrics proposed in this KIP, the reviewers and I noticed some inconsistencies and drawbacks of the metrics I proposed in the KIP.
Here's the diff: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=24&selectedPageVersions=23 * The proposed metrics were all INFO level, but they would be updated on every record, creating a performance concern. If we can refactor the metrics framework in the future to resolve this concern, we may move the metrics back to INFO level. * having separate metrics for memory and disk buffers is unnecessarily complex. The main utility is determining how close the buffer is to the configured limit, which makes a single metric more useful. I've combined them into one "suppression-buffer-size-*" metric. * The "intermediate-result-suppression-*" metric would be equivalent to the "process-*" metric which is already available on the ProcessorNode. I've removed it from the KIP. * The "suppression-mem-buffer-evict-*" metric had been proposed as a buffer metric, but it makes more sense as a processor node metric, since its counterpart is the "process-*" metric. I've replaced it with a processor node metric, "suppression-emit-*" Let me know if you want to recast votes in response to this change. -John On Thu, Oct 4, 2018 at 11:26 AM John Roesler <j...@confluent.io> wrote: > Update: Here's a link to the documented eviction behavior: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-BufferEvictionBehavior(akaSuppressEmitBehavior) > > On Thu, Oct 4, 2018 at 11:12 AM John Roesler <j...@confluent.io> wrote: > >> Hello again, all, >> >> During review, we realized that there is a relationship between this >> (KIP-328) and KIP-372. >> >> KIP-372 proposed to allow naming *all* internal topics, and KIP-328 adds >> a new internal topic (the changelog for the suppression buffer). >> >> However, we didn't consider this relationship in either KIP discussion, >> possibly since they were discussed and accepted concurrently. >> >> I have updated KIP-328 to effectively "merge" the two KIPs by adding a >> `withName` builder to Suppressed in the style of the other builders added >> in KIP-372: >> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=20&selectedPageVersions=19 >> . >> >> I think this should be uncontroversial, but as always, let me know of any >> objections you may have. >> >> >> Also, note that I'll be updating the KIP to document the exact buffer >> eviction behavior. I previously treated this as an internal implementation >> detail, but after consideration, I think users would want to know the >> eviction semantics, especially if they are debugging their applications and >> scrutinizing the sequence of emitted records. >> >> Thanks, >> -John >> >> On Thu, Sep 20, 2018 at 5:34 PM John Roesler <j...@confluent.io> wrote: >> >>> Hello all, >>> >>> During review of https://github.com/apache/kafka/pull/5567 for KIP-328, >>> the reviewers raised many good suggestions for the API. >>> >>> The basic design of the suppress operation remains the same, but the >>> config object is (in my opinion) far more ergonomic with their >>> suggestions. >>> >>> I have updated the KIP to reflect the new config ( >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-NewSuppressOperator >>> ) >>> >>> Please let me know if anyone wishes to change their vote, and we call >>> for a recast. >>> >>> Thanks, >>> -John >>> >>> On Thu, Aug 23, 2018 at 12:54 PM Matthias J. Sax <matth...@confluent.io> >>> wrote: >>> >>>> It seems nobody has any objections against the change. >>>> >>>> That's for the KIP improvement. I'll go ahead and merge the PR. >>>> >>>> >>>> -Matthias >>>> >>>> On 8/21/18 2:44 PM, John Roesler wrote: >>>> > Hello again, all, >>>> > >>>> > I belatedly had a better idea for adding grace period to the Windows >>>> class >>>> > hierarchy (TimeWindows, UnlimitedWindows, JoinWindows). Instead of >>>> > providing the grace-setter in the abstract class and having to >>>> retract it >>>> > in UnlimitedWindows, I've made the getter abstract method in Windows >>>> and >>>> > only added setters to Time and Join windows. >>>> > >>>> > This should not only improve the ergonomics of grace period, but make >>>> the >>>> > whole class hierarchy more maintainable. >>>> > >>>> > See the PR for more details: >>>> https://github.com/apache/kafka/pull/5536 >>>> > >>>> > I've updated the KIP accordingly. Here's the diff: >>>> > >>>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=11&selectedPageVersions=9 >>>> > >>>> > Please let me know if this changes your vote. >>>> > >>>> > Thanks, >>>> > -John >>>> > >>>> > On Mon, Aug 13, 2018 at 5:20 PM John Roesler <j...@confluent.io> >>>> wrote: >>>> > >>>> >> Hey all, >>>> >> >>>> >> I just wanted to let you know that a few small issues surfaced during >>>> >> implementation and review. I've updated the KIP. Here's the diff: >>>> >> >>>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=9&selectedPageVersions=8 >>>> >> >>>> >> Basically: >>>> >> * the metrics named "*-event-*" are inconsistent with existing >>>> >> nomenclature, and will be "*-record-*" instead (late records instead >>>> of >>>> >> late events, for example) >>>> >> * the apis taking and returning Duration will use long millis >>>> instead. We >>>> >> do want to transition to Duration in the future, but we shouldn't do >>>> it >>>> >> piecemeal. >>>> >> >>>> >> Thanks, >>>> >> -John >>>> >> >>>> >> On Tue, Aug 7, 2018 at 12:07 PM John Roesler <j...@confluent.io> >>>> wrote: >>>> >> >>>> >>> Thanks everyone, KIP-328 has passed with 3 binding votes (Guozhang, >>>> >>> Damian, and Matthias) and 3 non-binding (Ted, Bill, and me). >>>> >>> >>>> >>> Thanks for your time, >>>> >>> -John >>>> >>> >>>> >>> On Mon, Aug 6, 2018 at 6:35 PM Matthias J. Sax < >>>> matth...@confluent.io> >>>> >>> wrote: >>>> >>> >>>> >>>> +1 (binding) >>>> >>>> >>>> >>>> Thanks for the KIP. >>>> >>>> >>>> >>>> >>>> >>>> -Matthias >>>> >>>> >>>> >>>> On 8/3/18 12:52 AM, Damian Guy wrote: >>>> >>>>> Thanks John! +1 >>>> >>>>> >>>> >>>>> On Mon, 30 Jul 2018 at 23:58 Guozhang Wang <wangg...@gmail.com> >>>> wrote: >>>> >>>>> >>>> >>>>>> Yes, the addendum lgtm as well. Thanks! >>>> >>>>>> >>>> >>>>>> On Mon, Jul 30, 2018 at 3:34 PM, John Roesler <j...@confluent.io >>>> > >>>> >>>> wrote: >>>> >>>>>> >>>> >>>>>>> Another thing that came up after I started working on an >>>> >>>> implementation >>>> >>>>>> is >>>> >>>>>>> that in addition to deprecating "retention" from the Windows >>>> >>>> interface, >>>> >>>>>> we >>>> >>>>>>> also need to deprecate "segmentInterval", for the same reasons. >>>> I >>>> >>>> simply >>>> >>>>>>> overlooked it previously. I've updated the KIP accordingly. >>>> >>>>>>> >>>> >>>>>>> Hopefully, this doesn't change anyone's vote. >>>> >>>>>>> >>>> >>>>>>> Thanks, >>>> >>>>>>> -John >>>> >>>>>>> >>>> >>>>>>> On Mon, Jul 30, 2018 at 5:31 PM John Roesler <j...@confluent.io >>>> > >>>> >>>> wrote: >>>> >>>>>>> >>>> >>>>>>>> Thanks Guozhang, >>>> >>>>>>>> >>>> >>>>>>>> Thanks for that catch. to clarify, currently, events are >>>> "late" only >>>> >>>>>> when >>>> >>>>>>>> they are older than the retention period. Currently, we detect >>>> this >>>> >>>> in >>>> >>>>>>> the >>>> >>>>>>>> processor and record it as a "skipped-record". We then do not >>>> >>>> attempt >>>> >>>>>> to >>>> >>>>>>>> store the event in the window store. If a user provided a >>>> >>>>>> pre-configured >>>> >>>>>>>> window store with a retention period smaller than the one they >>>> >>>> specify >>>> >>>>>>> via >>>> >>>>>>>> Windows#until, the segmented store will drop the update with no >>>> >>>> metric >>>> >>>>>>> and >>>> >>>>>>>> record a debug-level log. >>>> >>>>>>>> >>>> >>>>>>>> With KIP-328, with the introduction of "grace period" and >>>> moving >>>> >>>>>>> retention >>>> >>>>>>>> fully into the state store, we need to have metrics for both >>>> "late >>>> >>>>>>> events" >>>> >>>>>>>> (new records older than the grace period) and "expired window >>>> >>>> events" >>>> >>>>>>> (new >>>> >>>>>>>> records for windows that are no longer retained in the state >>>> >>>> store). I >>>> >>>>>>>> already proposed metrics for the late events, and I've just >>>> updated >>>> >>>> the >>>> >>>>>>> KIP >>>> >>>>>>>> with metrics for the expired window events. I also updated the >>>> KIP >>>> >>>> to >>>> >>>>>>> make >>>> >>>>>>>> it clear that neither late nor expired events will count as >>>> >>>>>>>> "skipped-records" any more. >>>> >>>>>>>> >>>> >>>>>>>> -John >>>> >>>>>>>> >>>> >>>>>>>> On Mon, Jul 30, 2018 at 4:22 PM Guozhang Wang < >>>> wangg...@gmail.com> >>>> >>>>>>> wrote: >>>> >>>>>>>> >>>> >>>>>>>>> Hi John, >>>> >>>>>>>>> >>>> >>>>>>>>> Thanks for the updated KIP, +1 from me, and one minor >>>> suggestion: >>>> >>>>>>>>> >>>> >>>>>>>>> Following your suggestion of the differentiation of >>>> >>>> `skipped-records` >>>> >>>>>>> v.s. >>>> >>>>>>>>> `late-event-drop`, we should probably consider moving the >>>> scenarios >>>> >>>>>>> where >>>> >>>>>>>>> records got ignored due the window not being available any >>>> more in >>>> >>>>>>>>> windowed >>>> >>>>>>>>> aggregation operators from the `skipped-records` metrics >>>> recording >>>> >>>> to >>>> >>>>>>> the >>>> >>>>>>>>> `late-event-drop` metrics recording. >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> Guozhang >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> On Mon, Jul 30, 2018 at 1:36 PM, Bill Bejeck < >>>> bbej...@gmail.com> >>>> >>>>>> wrote: >>>> >>>>>>>>> >>>> >>>>>>>>>> Thanks for the KIP! >>>> >>>>>>>>>> >>>> >>>>>>>>>> +1 >>>> >>>>>>>>>> >>>> >>>>>>>>>> -Bill >>>> >>>>>>>>>> >>>> >>>>>>>>>> On Mon, Jul 30, 2018 at 3:42 PM Ted Yu <yuzhih...@gmail.com> >>>> >>>> wrote: >>>> >>>>>>>>>> >>>> >>>>>>>>>>> +1 >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> On Mon, Jul 30, 2018 at 11:46 AM John Roesler < >>>> j...@confluent.io >>>> >>>>> >>>> >>>>>>>>> wrote: >>>> >>>>>>>>>>> >>>> >>>>>>>>>>>> Hello devs, >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> The discussion of KIP-328 has gone some time with no new >>>> >>>>>> comments, >>>> >>>>>>>>> so I >>>> >>>>>>>>>>> am >>>> >>>>>>>>>>>> calling for a vote! >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Here's the KIP: >>>> https://cwiki.apache.org/confluence/x/sQU0BQ >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> The basic idea is to provide: >>>> >>>>>>>>>>>> * more usable control over update rate (vs the current >>>> state >>>> >>>>>> store >>>> >>>>>>>>>>> caches) >>>> >>>>>>>>>>>> * the final-result-for-windowed-computations feature which >>>> >>>>>>> several >>>> >>>>>>>>>> people >>>> >>>>>>>>>>>> have requested >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Thanks, >>>> >>>>>>>>>>>> -John >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> -- >>>> >>>>>>>>> -- Guozhang >>>> >>>>>>>>> >>>> >>>>>>>> >>>> >>>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> -- >>>> >>>>>> -- Guozhang >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>>> >>>> > >>>> >>>>