Hi all, sorry to do this again, but during review of the code to add the
metrics proposed in this KIP, the reviewers and I noticed some
inconsistencies and drawbacks of the metrics I proposed in the KIP.

Here's the diff:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=24&selectedPageVersions=23

* The proposed metrics were all INFO level, but they would be updated on
every record, creating a performance concern. If we can refactor the
metrics framework in the future to resolve this concern, we may move the
metrics back to INFO level.
* having separate metrics for memory and disk buffers is unnecessarily
complex. The main utility is determining how close the buffer is to the
configured limit, which makes a single metric more useful. I've combined
them into one "suppression-buffer-size-*" metric.
* The "intermediate-result-suppression-*" metric would be equivalent to the
"process-*" metric which is already available on the ProcessorNode. I've
removed it from the KIP.
* The "suppression-mem-buffer-evict-*" metric had been proposed as a buffer
metric, but it makes more sense as a processor node metric, since its
counterpart is the "process-*" metric. I've replaced it with a processor
node metric, "suppression-emit-*"

Let me know if you want to recast votes in response to this change.

-John

On Thu, Oct 4, 2018 at 11:26 AM John Roesler <j...@confluent.io> wrote:

> Update: Here's a link to the documented eviction behavior:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-BufferEvictionBehavior(akaSuppressEmitBehavior)
>
> On Thu, Oct 4, 2018 at 11:12 AM John Roesler <j...@confluent.io> wrote:
>
>> Hello again, all,
>>
>> During review, we realized that there is a relationship between this
>> (KIP-328) and KIP-372.
>>
>> KIP-372 proposed to allow naming *all* internal topics, and KIP-328 adds
>> a new internal topic (the changelog for the suppression buffer).
>>
>> However, we didn't consider this relationship in either KIP discussion,
>> possibly since they were discussed and accepted concurrently.
>>
>> I have updated KIP-328 to effectively "merge" the two KIPs by adding a
>> `withName` builder to Suppressed in the style of the other builders added
>> in KIP-372:
>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=20&selectedPageVersions=19
>> .
>>
>> I think this should be uncontroversial, but as always, let me know of any
>> objections you may have.
>>
>>
>> Also, note that I'll be updating the KIP to document the exact buffer
>> eviction behavior. I previously treated this as an internal implementation
>> detail, but after consideration, I think users would want to know the
>> eviction semantics, especially if they are debugging their applications and
>> scrutinizing the sequence of emitted records.
>>
>> Thanks,
>> -John
>>
>> On Thu, Sep 20, 2018 at 5:34 PM John Roesler <j...@confluent.io> wrote:
>>
>>> Hello all,
>>>
>>> During review of https://github.com/apache/kafka/pull/5567 for KIP-328,
>>> the reviewers raised many good suggestions for the API.
>>>
>>> The basic design of the suppress operation remains the same, but the
>>> config object is (in my opinion) far more ergonomic with their
>>> suggestions.
>>>
>>> I have updated the KIP to reflect the new config (
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-NewSuppressOperator
>>> )
>>>
>>> Please let me know if anyone wishes to change their vote, and we call
>>> for a recast.
>>>
>>> Thanks,
>>> -John
>>>
>>> On Thu, Aug 23, 2018 at 12:54 PM Matthias J. Sax <matth...@confluent.io>
>>> wrote:
>>>
>>>> It seems nobody has any objections against the change.
>>>>
>>>> That's for the KIP improvement. I'll go ahead and merge the PR.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 8/21/18 2:44 PM, John Roesler wrote:
>>>> > Hello again, all,
>>>> >
>>>> > I belatedly had a better idea for adding grace period to the Windows
>>>> class
>>>> > hierarchy (TimeWindows, UnlimitedWindows, JoinWindows). Instead of
>>>> > providing the grace-setter in the abstract class and having to
>>>> retract it
>>>> > in UnlimitedWindows, I've made the getter abstract method in Windows
>>>> and
>>>> > only added setters to Time and Join windows.
>>>> >
>>>> > This should not only improve the ergonomics of grace period, but make
>>>> the
>>>> > whole class hierarchy more maintainable.
>>>> >
>>>> > See the PR for more details:
>>>> https://github.com/apache/kafka/pull/5536
>>>> >
>>>> > I've updated the KIP accordingly. Here's the diff:
>>>> >
>>>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=11&selectedPageVersions=9
>>>> >
>>>> > Please let me know if this changes your vote.
>>>> >
>>>> > Thanks,
>>>> > -John
>>>> >
>>>> > On Mon, Aug 13, 2018 at 5:20 PM John Roesler <j...@confluent.io>
>>>> wrote:
>>>> >
>>>> >> Hey all,
>>>> >>
>>>> >> I just wanted to let you know that a few small issues surfaced during
>>>> >> implementation and review. I've updated the KIP. Here's the diff:
>>>> >>
>>>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=87295409&selectedPageVersions=9&selectedPageVersions=8
>>>> >>
>>>> >> Basically:
>>>> >> * the metrics named "*-event-*" are inconsistent with existing
>>>> >> nomenclature, and will be "*-record-*" instead (late records instead
>>>> of
>>>> >> late events, for example)
>>>> >> * the apis taking and returning Duration will use long millis
>>>> instead. We
>>>> >> do want to transition to Duration in the future, but we shouldn't do
>>>> it
>>>> >> piecemeal.
>>>> >>
>>>> >> Thanks,
>>>> >> -John
>>>> >>
>>>> >> On Tue, Aug 7, 2018 at 12:07 PM John Roesler <j...@confluent.io>
>>>> wrote:
>>>> >>
>>>> >>> Thanks everyone, KIP-328 has passed with 3 binding votes (Guozhang,
>>>> >>> Damian, and Matthias) and 3 non-binding (Ted, Bill, and me).
>>>> >>>
>>>> >>> Thanks for your time,
>>>> >>> -John
>>>> >>>
>>>> >>> On Mon, Aug 6, 2018 at 6:35 PM Matthias J. Sax <
>>>> matth...@confluent.io>
>>>> >>> wrote:
>>>> >>>
>>>> >>>> +1 (binding)
>>>> >>>>
>>>> >>>> Thanks for the KIP.
>>>> >>>>
>>>> >>>>
>>>> >>>> -Matthias
>>>> >>>>
>>>> >>>> On 8/3/18 12:52 AM, Damian Guy wrote:
>>>> >>>>> Thanks John! +1
>>>> >>>>>
>>>> >>>>> On Mon, 30 Jul 2018 at 23:58 Guozhang Wang <wangg...@gmail.com>
>>>> wrote:
>>>> >>>>>
>>>> >>>>>> Yes, the addendum lgtm as well. Thanks!
>>>> >>>>>>
>>>> >>>>>> On Mon, Jul 30, 2018 at 3:34 PM, John Roesler <j...@confluent.io
>>>> >
>>>> >>>> wrote:
>>>> >>>>>>
>>>> >>>>>>> Another thing that came up after I started working on an
>>>> >>>> implementation
>>>> >>>>>> is
>>>> >>>>>>> that in addition to deprecating "retention" from the Windows
>>>> >>>> interface,
>>>> >>>>>> we
>>>> >>>>>>> also need to deprecate "segmentInterval", for the same reasons.
>>>> I
>>>> >>>> simply
>>>> >>>>>>> overlooked it previously. I've updated the KIP accordingly.
>>>> >>>>>>>
>>>> >>>>>>> Hopefully, this doesn't change anyone's vote.
>>>> >>>>>>>
>>>> >>>>>>> Thanks,
>>>> >>>>>>> -John
>>>> >>>>>>>
>>>> >>>>>>> On Mon, Jul 30, 2018 at 5:31 PM John Roesler <j...@confluent.io
>>>> >
>>>> >>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>>> Thanks Guozhang,
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks for that catch. to clarify, currently, events are
>>>> "late" only
>>>> >>>>>> when
>>>> >>>>>>>> they are older than the retention period. Currently, we detect
>>>> this
>>>> >>>> in
>>>> >>>>>>> the
>>>> >>>>>>>> processor and record it as a "skipped-record". We then do not
>>>> >>>> attempt
>>>> >>>>>> to
>>>> >>>>>>>> store the event in the window store. If a user provided a
>>>> >>>>>> pre-configured
>>>> >>>>>>>> window store with a retention period smaller than the one they
>>>> >>>> specify
>>>> >>>>>>> via
>>>> >>>>>>>> Windows#until, the segmented store will drop the update with no
>>>> >>>> metric
>>>> >>>>>>> and
>>>> >>>>>>>> record a debug-level log.
>>>> >>>>>>>>
>>>> >>>>>>>> With KIP-328, with the introduction of "grace period" and
>>>> moving
>>>> >>>>>>> retention
>>>> >>>>>>>> fully into the state store, we need to have metrics for both
>>>> "late
>>>> >>>>>>> events"
>>>> >>>>>>>> (new records older than the grace period) and "expired window
>>>> >>>> events"
>>>> >>>>>>> (new
>>>> >>>>>>>> records for windows that are no longer retained in the state
>>>> >>>> store). I
>>>> >>>>>>>> already proposed metrics for the late events, and I've just
>>>> updated
>>>> >>>> the
>>>> >>>>>>> KIP
>>>> >>>>>>>> with metrics for the expired window events. I also updated the
>>>> KIP
>>>> >>>> to
>>>> >>>>>>> make
>>>> >>>>>>>> it clear that neither late nor expired events will count as
>>>> >>>>>>>> "skipped-records" any more.
>>>> >>>>>>>>
>>>> >>>>>>>> -John
>>>> >>>>>>>>
>>>> >>>>>>>> On Mon, Jul 30, 2018 at 4:22 PM Guozhang Wang <
>>>> wangg...@gmail.com>
>>>> >>>>>>> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>>> Hi John,
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks for the updated KIP, +1 from me, and one minor
>>>> suggestion:
>>>> >>>>>>>>>
>>>> >>>>>>>>> Following your suggestion of the differentiation of
>>>> >>>> `skipped-records`
>>>> >>>>>>> v.s.
>>>> >>>>>>>>> `late-event-drop`, we should probably consider moving the
>>>> scenarios
>>>> >>>>>>> where
>>>> >>>>>>>>> records got ignored due the window not being available any
>>>> more in
>>>> >>>>>>>>> windowed
>>>> >>>>>>>>> aggregation operators from the `skipped-records` metrics
>>>> recording
>>>> >>>> to
>>>> >>>>>>> the
>>>> >>>>>>>>> `late-event-drop` metrics recording.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> Guozhang
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Mon, Jul 30, 2018 at 1:36 PM, Bill Bejeck <
>>>> bbej...@gmail.com>
>>>> >>>>>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>> Thanks for the KIP!
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> +1
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> -Bill
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Mon, Jul 30, 2018 at 3:42 PM Ted Yu <yuzhih...@gmail.com>
>>>> >>>> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>> +1
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Mon, Jul 30, 2018 at 11:46 AM John Roesler <
>>>> j...@confluent.io
>>>> >>>>>
>>>> >>>>>>>>> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>> Hello devs,
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> The discussion of KIP-328 has gone some time with no new
>>>> >>>>>> comments,
>>>> >>>>>>>>> so I
>>>> >>>>>>>>>>> am
>>>> >>>>>>>>>>>> calling for a vote!
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Here's the KIP:
>>>> https://cwiki.apache.org/confluence/x/sQU0BQ
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> The basic idea is to provide:
>>>> >>>>>>>>>>>> * more usable control over update rate (vs the current
>>>> state
>>>> >>>>>> store
>>>> >>>>>>>>>>> caches)
>>>> >>>>>>>>>>>> * the final-result-for-windowed-computations feature which
>>>> >>>>>>> several
>>>> >>>>>>>>>> people
>>>> >>>>>>>>>>>> have requested
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>> -John
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> -- Guozhang
>>>> >>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> -- Guozhang
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >
>>>>
>>>>

Reply via email to