Hi Bruno,

Thanks for your recommendations.

I've made those changes, and also updated the description for the new
config to read: "The minimum frequency in milliseconds with which to delete
fully consumed records from repartition topics. *Purging will occur after
at least this value since the last purge, but may be delayed until later in
order to meet the processing guarantee.* The default value is the same as
the default for commit.interval.ms (30000).  (Note, unlike
commit.interval.ms, the default for this value remains unchanged when
processing.guarantee is set to exactly_once_v2)."

This should make it clear that this is just a minimum interval, without
leaking too much detail in to the specification.

If there are no other issues, I'll put this to a vote.

Regards,

Nick Telford

On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <cado...@apache.org> wrote:

> Hi Nick,
>
> Sorry for the delay!
>
> Regarding point 7, I am fine with keeping the config as an interval and
> keeping it independently of the commit interval. However, I would then
> remove the following paragraph from the KIP:
>
> "We will still wait for a commit before explicitly deleting repartition
> records, but we will only do so if the time since the last record
> deletion is at least repartition.purge.interval.ms. This means the
> lower-bound for repartition.purge.interval.ms  is effectively capped by
> the value of commit.interval.ms."
>
> The reason is that in the previous paragraph you say that the configs
> can be modified separately and then in this paragraph you bind the purge
> interval to the commit interval. This seems a contradiction and
> indicates that you are leaking too much implementation details into the
> KIP. I think, just saying that the purge interval is a minimum and name
> it accordingly without talking about the actual implementation makes the
> KIP more robust against future implementation changes.
>
> My proposal for the config name is min.repartition.purge.interval.ms or
> even min.purge.interval.ms with a preference for the former.
>
> Best,
> Bruno
>
>
>
> On 04.01.22 17:21, Nick Telford wrote:
> >>
> >> You missed one "delete.interval.ms" in the last paragraph in section
> >> "Proposed Changes".
> >>
> >
> > I've fixed these now, including the title that I somehow missed!
> >
> > I am afraid, I again need to comment on point 7. IMO, it does not make
> >
> > sense to be able to tune repartition.purge.interval.ms and
> >
> > commit.interval.ms separately when the purge can only happen during a
> >
> > commit. For example, if I set commit.interval.ms to 30000 ms and
> >
> > repartition.purge.interval.ms to 35000 ms, the records will be purged at
> >
> > every second commit, i.e., every 60000 ms. What benefit do users have to
> >
> > set repartition.purge.interval.ms separately from commit.interval.ms?
> >
> > The rate of purging will never be 1 / 35000, the rate will be 1 /
> >
> > 2*commit.interval.ms..
> >
> >
> > Could we address this by chaning the name of the configuration to
> something
> > like "repartition.purge.min.interval.ms", to indicate that the
> repartition
> > purge interval will be *at least* this value?
> >
> > If that's still not suitable, are there any other existing configurations
> > that behave in a similar way, i.e. dictate a multiple of another
> interval,
> > that we could use as a basis for a new name for this configuration?
> >
> > Additionally, I have a new point.
> >
> > 8. If user code has access to the processor context (e.g. in the
> >
> > processor API), a commit can also be requested on demand by user code.
> >
> > The KIP should clarify if purges might also happen during requested
> >
> > commits or if purges only happen during automatic commits.
> >
> >
> > Good point. I'm holding off on amending this for now until we agree on an
> > outcome for point 7 above, because I suspect it may at least influence
> the
> > wording of this section.
> >
> > Thanks for the feedback, and sorry for the delay. Now that the holidays
> are
> > over I'm able to focus on this again.
> >
> > Regards,
> >
> > Nick Telford
> >
> > On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <cado...@apache.org> wrote:
> >
> >> Hi Nick,
> >>
> >> Thanks for the updates!
> >>
> >> The motivation section and the description of the new config is much
> >> clearer and informative now!
> >>
> >> You missed one "delete.interval.ms" in the last paragraph in section
> >> "Proposed Changes".
> >>
> >> I am afraid, I again need to comment on point 7. IMO, it does not make
> >> sense to be able to tune repartition.purge.interval.ms and
> >> commit.interval.ms separately when the purge can only happen during a
> >> commit. For example, if I set commit.interval.ms to 30000 ms and
> >> repartition.purge.interval.ms to 35000 ms, the records will be purged
> at
> >> every second commit, i.e., every 60000 ms. What benefit do users have to
> >> set repartition.purge.interval.ms separately from commit.interval.ms?
> >> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >> 2*commit.interval.ms..
> >>
> >> Additionally, I have a new point.
> >> 8. If user code has access to the processor context (e.g. in the
> >> processor API), a commit can also be requested on demand by user code.
> >> The KIP should clarify if purges might also happen during requested
> >> commits or if purges only happen during automatic commits.
> >>
> >> Best,
> >> Bruno
> >>
> >> On 21.12.21 20:40, Nick Telford wrote:
> >>> Hi everyone,
> >>>
> >>> Thanks for your feedback. I've made the suggested changes to the KIP
> (1,
> >> 2,
> >>> 3 and 5).
> >>>
> >>> For the new name, I've chosen repartition.purge.interval.ms, as I felt
> >> it
> >>> struck a good balance between being self-descriptive and concise.
> Please
> >>> let me know if you'd prefer something else.
> >>>
> >>> On point 6: My PR has basic validation to ensure the value is positive,
> >> but
> >>> I don't think it's necessary to have dynamic validation, to ensure it's
> >> not
> >>> less than commit.interval.ms. The reason is that it will be implicitly
> >>> limited to that value anyway, and won't break anything. But I can add
> it
> >> if
> >>> you'd prefer it.
> >>>
> >>> On point 7: I worry that defaulting it to follow the value of
> >>> commit.interval.ms may confuse users, who will likely expect the
> >> default to
> >>> not be affected by changes to other configuration options. I can see
> the
> >>> appeal of retaining the existing behaviour (following the commit
> >> interval)
> >>> by default, but I believe that the majority of users who customize
> >>> commit.interval.ms do not desire a different frequency of repartition
> >>> record purging as well.
> >>>
> >>> As for multiples of commit interval: I think the argument against that
> is
> >>> that an interval is more intuitive when given as a time, rather than
> as a
> >>> multiple of some other operation. Users configuring this should not
> need
> >> to
> >>> break out a calculator to work out how frequently the records are going
> >> to
> >>> be purged!
> >>>
> >>> I've also updated the PR with the relevant changes.
> >>>
> >>> BTW, for some reason I didn't receive Sophie's email. I'll periodically
> >>> check the thread on the archive to ensure I don't miss any more of your
> >>> messages!
> >>>
> >>> Regards,
> >>>
> >>> Nick
> >>>
> >>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <show...@gmail.com> wrote:
> >>>
> >>>> Thanks, Bruno.
> >>>>
> >>>> I agree my point 4 is more like PR discussion, not KIP discussion.
> >>>> @Nick , please update my point 4 in PR directly.
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <cado...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Nick,
> >>>>>
> >>>>> Thank you for the KIP!
> >>>>>
> >>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
> >> that
> >>>>> we should update the docs for commit.interval.ms but I am not sure
> if
> >>>>> this needs to mentioned in the KIP. That seems to me more a PR
> >>>>> discussion. Also on point 2, I agree that we need to add a doc string
> >>>>> but the content should be exemplary not binding. What I want to say
> is
> >>>>> that, we do not need a KIP to change docs.
> >>>>>
> >>>>> Here my points:
> >>>>>
> >>>>> 5. Could you specify in the motivation that the KIP is about deleting
> >>>>> records from repartition topics? Maybe with a short description when
> >> why
> >>>>> and when records are deleted from the repartition topics. For us it
> >>>>> might be clear, but IMO we should try to write KIPs so that someone
> >> that
> >>>>> is relatively new to Kafka Streams can understand the KIP without
> >>>>> needing to know too much background.
> >>>>>
> >>>>> 6. Does the config need to be validated? For example, does
> >>>>> delete.interval.ms need to be greater than or equal to
> >>>> commit.interval.ms?
> >>>>>
> >>>>> 7. Should the default value for non-EOS be 30s or the same value as
> >>>>> commit.interval.ms? I am just thinking about the case where a user
> >>>>> explicitly changes commit.interval.ms but not delete.interval.ms (or
> >>>>> whatever name you come up for it). Once delete.interval.ms is set
> >>>>> explicitly it is decoupled from commit.interval.ms. Similar could be
> >>>>> done for the EOS case.
> >>>>> Alternatively, we could also define delete.interval.ms to take a
> >>>>> integral number without a unit that specifies after how many commit
> >>>>> intervals the records in repartition topics should be deleted. This
> >>>>> would make sense since delete.interval.ms is tightly bound to
> >>>>> commit.interval.ms. Additionally, it would make the semantics of the
> >>>>> config simpler. The name of the config should definitely change if we
> >> go
> >>>>> down this way.
> >>>>>
> >>>>> Best,
> >>>>> Bruno
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 21.12.21 11:14, Luke Chen wrote:
> >>>>>> Hi Nick,
> >>>>>>
> >>>>>> Thanks for the KIP!
> >>>>>>
> >>>>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>>>> 3. I think you should mention the behavior change *explicitly* in
> >>>>>> "Compatibility" section. I know you already mentioned it in KIP, in
> >> the
> >>>>>> benefit way. But I think in this section, we should clearly point
> out
> >>>>> which
> >>>>>> behavior will be change after this KIP. That is, you should put it
> >>>> clear
> >>>>>> that the delete record interval will change from 100ms to 30s with
> EOS
> >>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
> >>>>>> 4. Since this new config has some relationship with
> >> commit.interval.ms
> >>>> ,
> >>>>> I
> >>>>>> think we should also update the doc description for `
> >>>> commit.interval.ms
> >>>>> `,
> >>>>>> to let user know there's another config to control delete interval
> and
> >>>>>> should be greater than commit.interval.ms. Something like that.
> WDYT?
> >>>>> (You
> >>>>>> should put this change in the KIP as Sophie mentioned)
> >>>>>>
> >>>>>> Thank you.
> >>>>>> Luke
> >>>>>>
> >>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>>>> <sop...@confluent.io.invalid> wrote:
> >>>>>>
> >>>>>>> Hey Nick,
> >>>>>>>
> >>>>>>> I think you forgot to link to the KIP document, but I take it this
> is
> >>>>>>> it: KIP-811:
> >>>>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>>>
> >>>>>>> The overall proposal sounds good to me, just a few minor things:
> >>>>>>>
> >>>>>>>       1. Please specify everything needed to define this config
> >>>>> explicitly, ie
> >>>>>>>       all the arguments that will be passed in to the
> >>>>>>>       StreamsConfig's ConfigDef: in addition to the default value,
> we
> >>>>> need the
> >>>>>>>       config type (presumably a Long), the doc
> >>>>>>>       string, and the importance (probably "low", similar to
> >>>>>>> commit.interval.ms
> >>>>>>>       )
> >>>>>>>       2. Might be worth considering a slightly more descriptive
> name
> >> for
> >>>>> this
> >>>>>>>       config. Most users probably don't think about,
> >>>>>>>       or may not even be aware of, the deletion of consumed
> records by
> >>>>> Kafka
> >>>>>>>       Streams, so calling it something along
> >>>>>>>       the lines of "repartition.records.delete.interval.ms" or "
> >>>>>>>       consumed.records.deletion.interval.ms" or so on will help
> >>>>>>>       make it clear what the config refers to and whether or not
> they
> >>>>> need to
> >>>>>>>       care. Maybe you can come up with better
> >>>>>>>       and/or shorter names, just wanted to suggest some example
> names
> >>>>> that I
> >>>>>>>       think sufficiently get the point across
> >>>>>>>
> >>>>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>>>
> >>>>>>> Sophie
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
> nick.telf...@gmail.com
> >>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution
> >> is
> >>>>>>> very
> >>>>>>>> simple, so the KIP is pretty short.
> >>>>>>>>
> >>>>>>>> The suggested changes are implemented by this PR
> >>>>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>>>> --
> >>>>>>>> Nick Telford
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to