>
> You missed one "delete.interval.ms" in the last paragraph in section
> "Proposed Changes".
>

I've fixed these now, including the title that I somehow missed!

I am afraid, I again need to comment on point 7. IMO, it does not make

sense to be able to tune repartition.purge.interval.ms and

commit.interval.ms separately when the purge can only happen during a

commit. For example, if I set commit.interval.ms to 30000 ms and

repartition.purge.interval.ms to 35000 ms, the records will be purged at

every second commit, i.e., every 60000 ms. What benefit do users have to

set repartition.purge.interval.ms separately from commit.interval.ms?

The rate of purging will never be 1 / 35000, the rate will be 1 /

2*commit.interval.ms..


Could we address this by chaning the name of the configuration to something
like "repartition.purge.min.interval.ms", to indicate that the repartition
purge interval will be *at least* this value?

If that's still not suitable, are there any other existing configurations
that behave in a similar way, i.e. dictate a multiple of another interval,
that we could use as a basis for a new name for this configuration?

Additionally, I have a new point.

8. If user code has access to the processor context (e.g. in the

processor API), a commit can also be requested on demand by user code.

The KIP should clarify if purges might also happen during requested

commits or if purges only happen during automatic commits.


Good point. I'm holding off on amending this for now until we agree on an
outcome for point 7 above, because I suspect it may at least influence the
wording of this section.

Thanks for the feedback, and sorry for the delay. Now that the holidays are
over I'm able to focus on this again.

Regards,

Nick Telford

On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <cado...@apache.org> wrote:

> Hi Nick,
>
> Thanks for the updates!
>
> The motivation section and the description of the new config is much
> clearer and informative now!
>
> You missed one "delete.interval.ms" in the last paragraph in section
> "Proposed Changes".
>
> I am afraid, I again need to comment on point 7. IMO, it does not make
> sense to be able to tune repartition.purge.interval.ms and
> commit.interval.ms separately when the purge can only happen during a
> commit. For example, if I set commit.interval.ms to 30000 ms and
> repartition.purge.interval.ms to 35000 ms, the records will be purged at
> every second commit, i.e., every 60000 ms. What benefit do users have to
> set repartition.purge.interval.ms separately from commit.interval.ms?
> The rate of purging will never be 1 / 35000, the rate will be 1 /
> 2*commit.interval.ms..
>
> Additionally, I have a new point.
> 8. If user code has access to the processor context (e.g. in the
> processor API), a commit can also be requested on demand by user code.
> The KIP should clarify if purges might also happen during requested
> commits or if purges only happen during automatic commits.
>
> Best,
> Bruno
>
> On 21.12.21 20:40, Nick Telford wrote:
> > Hi everyone,
> >
> > Thanks for your feedback. I've made the suggested changes to the KIP (1,
> 2,
> > 3 and 5).
> >
> > For the new name, I've chosen repartition.purge.interval.ms, as I felt
> it
> > struck a good balance between being self-descriptive and concise. Please
> > let me know if you'd prefer something else.
> >
> > On point 6: My PR has basic validation to ensure the value is positive,
> but
> > I don't think it's necessary to have dynamic validation, to ensure it's
> not
> > less than commit.interval.ms. The reason is that it will be implicitly
> > limited to that value anyway, and won't break anything. But I can add it
> if
> > you'd prefer it.
> >
> > On point 7: I worry that defaulting it to follow the value of
> > commit.interval.ms may confuse users, who will likely expect the
> default to
> > not be affected by changes to other configuration options. I can see the
> > appeal of retaining the existing behaviour (following the commit
> interval)
> > by default, but I believe that the majority of users who customize
> > commit.interval.ms do not desire a different frequency of repartition
> > record purging as well.
> >
> > As for multiples of commit interval: I think the argument against that is
> > that an interval is more intuitive when given as a time, rather than as a
> > multiple of some other operation. Users configuring this should not need
> to
> > break out a calculator to work out how frequently the records are going
> to
> > be purged!
> >
> > I've also updated the PR with the relevant changes.
> >
> > BTW, for some reason I didn't receive Sophie's email. I'll periodically
> > check the thread on the archive to ensure I don't miss any more of your
> > messages!
> >
> > Regards,
> >
> > Nick
> >
> > On Tue, 21 Dec 2021 at 12:34, Luke Chen <show...@gmail.com> wrote:
> >
> >> Thanks, Bruno.
> >>
> >> I agree my point 4 is more like PR discussion, not KIP discussion.
> >> @Nick , please update my point 4 in PR directly.
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >>
> >>
> >> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <cado...@apache.org>
> wrote:
> >>
> >>> Hi Nick,
> >>>
> >>> Thank you for the KIP!
> >>>
> >>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
> that
> >>> we should update the docs for commit.interval.ms but I am not sure if
> >>> this needs to mentioned in the KIP. That seems to me more a PR
> >>> discussion. Also on point 2, I agree that we need to add a doc string
> >>> but the content should be exemplary not binding. What I want to say is
> >>> that, we do not need a KIP to change docs.
> >>>
> >>> Here my points:
> >>>
> >>> 5. Could you specify in the motivation that the KIP is about deleting
> >>> records from repartition topics? Maybe with a short description when
> why
> >>> and when records are deleted from the repartition topics. For us it
> >>> might be clear, but IMO we should try to write KIPs so that someone
> that
> >>> is relatively new to Kafka Streams can understand the KIP without
> >>> needing to know too much background.
> >>>
> >>> 6. Does the config need to be validated? For example, does
> >>> delete.interval.ms need to be greater than or equal to
> >> commit.interval.ms?
> >>>
> >>> 7. Should the default value for non-EOS be 30s or the same value as
> >>> commit.interval.ms? I am just thinking about the case where a user
> >>> explicitly changes commit.interval.ms but not delete.interval.ms (or
> >>> whatever name you come up for it). Once delete.interval.ms is set
> >>> explicitly it is decoupled from commit.interval.ms. Similar could be
> >>> done for the EOS case.
> >>> Alternatively, we could also define delete.interval.ms to take a
> >>> integral number without a unit that specifies after how many commit
> >>> intervals the records in repartition topics should be deleted. This
> >>> would make sense since delete.interval.ms is tightly bound to
> >>> commit.interval.ms. Additionally, it would make the semantics of the
> >>> config simpler. The name of the config should definitely change if we
> go
> >>> down this way.
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>>
> >>>
> >>> On 21.12.21 11:14, Luke Chen wrote:
> >>>> Hi Nick,
> >>>>
> >>>> Thanks for the KIP!
> >>>>
> >>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>> 3. I think you should mention the behavior change *explicitly* in
> >>>> "Compatibility" section. I know you already mentioned it in KIP, in
> the
> >>>> benefit way. But I think in this section, we should clearly point out
> >>> which
> >>>> behavior will be change after this KIP. That is, you should put it
> >> clear
> >>>> that the delete record interval will change from 100ms to 30s with EOS
> >>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
> >>>> 4. Since this new config has some relationship with
> commit.interval.ms
> >> ,
> >>> I
> >>>> think we should also update the doc description for `
> >> commit.interval.ms
> >>> `,
> >>>> to let user know there's another config to control delete interval and
> >>>> should be greater than commit.interval.ms. Something like that. WDYT?
> >>> (You
> >>>> should put this change in the KIP as Sophie mentioned)
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>>
> >>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>> <sop...@confluent.io.invalid> wrote:
> >>>>
> >>>>> Hey Nick,
> >>>>>
> >>>>> I think you forgot to link to the KIP document, but I take it this is
> >>>>> it: KIP-811:
> >>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>
> >>>>> The overall proposal sounds good to me, just a few minor things:
> >>>>>
> >>>>>      1. Please specify everything needed to define this config
> >>> explicitly, ie
> >>>>>      all the arguments that will be passed in to the
> >>>>>      StreamsConfig's ConfigDef: in addition to the default value, we
> >>> need the
> >>>>>      config type (presumably a Long), the doc
> >>>>>      string, and the importance (probably "low", similar to
> >>>>> commit.interval.ms
> >>>>>      )
> >>>>>      2. Might be worth considering a slightly more descriptive name
> for
> >>> this
> >>>>>      config. Most users probably don't think about,
> >>>>>      or may not even be aware of, the deletion of consumed records by
> >>> Kafka
> >>>>>      Streams, so calling it something along
> >>>>>      the lines of "repartition.records.delete.interval.ms" or "
> >>>>>      consumed.records.deletion.interval.ms" or so on will help
> >>>>>      make it clear what the config refers to and whether or not they
> >>> need to
> >>>>>      care. Maybe you can come up with better
> >>>>>      and/or shorter names, just wanted to suggest some example names
> >>> that I
> >>>>>      think sufficiently get the point across
> >>>>>
> >>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>
> >>>>> Sophie
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <nick.telf...@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution
> is
> >>>>> very
> >>>>>> simple, so the KIP is pretty short.
> >>>>>>
> >>>>>> The suggested changes are implemented by this PR
> >>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>> --
> >>>>>> Nick Telford
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to