> The unit of frequency is 1/s. What we are actually setting with this > config is the period. So frequency is used a bit wrongly here. Since I > discovered that we use the word "frequency" also for other similar > configs, I leave it up to you if you want to change it in this case or > not. I am fine either way.
I'm embarrassed to say this is not the first time I've made this mistake and been corrected on it! I've fixed it as I can't imagine anyone objecting, and I've also dropped the "min." prefix. For the VOTE thread, since no votes have been made yet, I'm going to assume that they're made on the latest version of the KIP. Regards, Nick On Thu, 13 Jan 2022 at 10:29, Bruno Cadonna <cado...@apache.org> wrote: > Hi Nick, > > Dropping the `min.` prefix is fine with me. > > A really minor nit-picking: > > "The minimum frequency in milliseconds ..." > > The unit of frequency is 1/s. What we are actually setting with this > config is the period. So frequency is used a bit wrongly here. Since I > discovered that we use the word "frequency" also for other similar > configs, I leave it up to you if you want to change it in this case or > not. I am fine either way. > > Best, > Bruno > > On 13.01.22 10:00, Nick Telford wrote: > > Hi Matthias, > > > > You raise a good point: commit.interval.ms only specifies a minimum > > interval, because commits are powered by record flow through the > Topology, > > if there are no input records, there will be no commits. The parallel > here > > is that repartition record purges are powered by commits, and if there > are > > no commits, there will be no record purges. > > > > Bruno, does this satisfy your misgivings with the name > > repartition.purge.interval.ms? > > > >> For the config description, I would remove > > > >>> The default value is the same as the default for commit.interval.ms > > (30000). > > > >> It seems unnecessary to me. > > > > My intent was to communicate to users that the default has been set to > > match the commit interval, for compatibility. But since the default value > > is provided elsewhere, I can see why it's redundant. I'll remove that. > > > > Thanks, > > > > Nick > > > > On Thu, 13 Jan 2022 at 03:14, Matthias J. Sax <mj...@apache.org> wrote: > > > >> Thanks for the KIP! Very good discussion. > >> > >> I want to raise one point: I am not a fan of the `min.` prefix of the > >> config, because all configs like this are _minimums_. We also use > >> `commit.interval.ms` and not `min.commit.interval.ms` for example. I > >> would suggest to strip the `min.` prefix. > >> > >> > >> For the config description, I would remove > >> > >>> The default value is the same as the default for commit.interval.ms > >> (30000). > >> > >> It seems unnecessary to me. > >> > >> > >> -Matthias > >> > >> > >> > >> On 1/12/22 7:21 AM, Nick Telford wrote: > >>> Thanks Bruno, I've made those changes. > >>> > >>> I'll call a vote on the KIP later today. > >>> > >>> Regards, > >>> > >>> Nick Telford > >>> > >>> On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <cado...@apache.org> > wrote: > >>> > >>>> Hi Nick, > >>>> > >>>> Great! > >>>> > >>>> I think the KIP is ready for voting. I have just a couple of minor > >>>> comments. > >>>> > >>>> a. > >>>> In the config description, I would replace > >>>> > >>>> "Purging will occur after at least this value since the last purge, > but > >>>> may be delayed until later in order to meet the processing guarantee." > >>>> > >>>> with > >>>> > >>>> "Purging will occur after at least this value since the last purge, > but > >>>> may be delayed until later." > >>>> > >>>> I do not really understand what you mean with "meet the processing > >>>> guarantee" and I think it suffices to say that the purge might be > >> delayed. > >>>> > >>>> > >>>> b. > >>>> I would change the title to > >>>> > >>>> "Add config min.repartition.purge.interval.ms to Kafka Streams" > >>>> > >>>> > >>>> c. > >>>> The current rejected alternative leaks implementation details since it > >>>> refers to the coupling of purge to commit and we agreed to leave that > >>>> out of the KIP. > >>>> > >>>> > >>>> d. > >>>> Could you add my proposal to specify the config as a multiple of the > >>>> commit interval to the rejected alternatives with the reason why we > >>>> discarded it? > >>>> > >>>> For the rest, I am +1. > >>>> > >>>> Best, > >>>> Bruno > >>>> > >>>> > >>>> > >>>> On 11.01.22 16:47, Nick Telford wrote: > >>>>> Hi Bruno, > >>>>> > >>>>> Thanks for your recommendations. > >>>>> > >>>>> I've made those changes, and also updated the description for the new > >>>>> config to read: "The minimum frequency in milliseconds with which to > >>>> delete > >>>>> fully consumed records from repartition topics. *Purging will occur > >> after > >>>>> at least this value since the last purge, but may be delayed until > >> later > >>>> in > >>>>> order to meet the processing guarantee.* The default value is the > same > >> as > >>>>> the default for commit.interval.ms (30000). (Note, unlike > >>>>> commit.interval.ms, the default for this value remains unchanged > when > >>>>> processing.guarantee is set to exactly_once_v2)." > >>>>> > >>>>> This should make it clear that this is just a minimum interval, > without > >>>>> leaking too much detail in to the specification. > >>>>> > >>>>> If there are no other issues, I'll put this to a vote. > >>>>> > >>>>> Regards, > >>>>> > >>>>> Nick Telford > >>>>> > >>>>> On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <cado...@apache.org> > >> wrote: > >>>>> > >>>>>> Hi Nick, > >>>>>> > >>>>>> Sorry for the delay! > >>>>>> > >>>>>> Regarding point 7, I am fine with keeping the config as an interval > >> and > >>>>>> keeping it independently of the commit interval. However, I would > then > >>>>>> remove the following paragraph from the KIP: > >>>>>> > >>>>>> "We will still wait for a commit before explicitly deleting > >> repartition > >>>>>> records, but we will only do so if the time since the last record > >>>>>> deletion is at least repartition.purge.interval.ms. This means the > >>>>>> lower-bound for repartition.purge.interval.ms is effectively > capped > >> by > >>>>>> the value of commit.interval.ms." > >>>>>> > >>>>>> The reason is that in the previous paragraph you say that the > configs > >>>>>> can be modified separately and then in this paragraph you bind the > >> purge > >>>>>> interval to the commit interval. This seems a contradiction and > >>>>>> indicates that you are leaking too much implementation details into > >> the > >>>>>> KIP. I think, just saying that the purge interval is a minimum and > >> name > >>>>>> it accordingly without talking about the actual implementation makes > >> the > >>>>>> KIP more robust against future implementation changes. > >>>>>> > >>>>>> My proposal for the config name is > min.repartition.purge.interval.ms > >> or > >>>>>> even min.purge.interval.ms with a preference for the former. > >>>>>> > >>>>>> Best, > >>>>>> Bruno > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 04.01.22 17:21, Nick Telford wrote: > >>>>>>>> > >>>>>>>> You missed one "delete.interval.ms" in the last paragraph in > >> section > >>>>>>>> "Proposed Changes". > >>>>>>>> > >>>>>>> > >>>>>>> I've fixed these now, including the title that I somehow missed! > >>>>>>> > >>>>>>> I am afraid, I again need to comment on point 7. IMO, it does not > >> make > >>>>>>> > >>>>>>> sense to be able to tune repartition.purge.interval.ms and > >>>>>>> > >>>>>>> commit.interval.ms separately when the purge can only happen > during > >> a > >>>>>>> > >>>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and > >>>>>>> > >>>>>>> repartition.purge.interval.ms to 35000 ms, the records will be > >> purged > >>>> at > >>>>>>> > >>>>>>> every second commit, i.e., every 60000 ms. What benefit do users > have > >>>> to > >>>>>>> > >>>>>>> set repartition.purge.interval.ms separately from > commit.interval.ms > >> ? > >>>>>>> > >>>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 / > >>>>>>> > >>>>>>> 2*commit.interval.ms.. > >>>>>>> > >>>>>>> > >>>>>>> Could we address this by chaning the name of the configuration to > >>>>>> something > >>>>>>> like "repartition.purge.min.interval.ms", to indicate that the > >>>>>> repartition > >>>>>>> purge interval will be *at least* this value? > >>>>>>> > >>>>>>> If that's still not suitable, are there any other existing > >>>> configurations > >>>>>>> that behave in a similar way, i.e. dictate a multiple of another > >>>>>> interval, > >>>>>>> that we could use as a basis for a new name for this configuration? > >>>>>>> > >>>>>>> Additionally, I have a new point. > >>>>>>> > >>>>>>> 8. If user code has access to the processor context (e.g. in the > >>>>>>> > >>>>>>> processor API), a commit can also be requested on demand by user > >> code. > >>>>>>> > >>>>>>> The KIP should clarify if purges might also happen during requested > >>>>>>> > >>>>>>> commits or if purges only happen during automatic commits. > >>>>>>> > >>>>>>> > >>>>>>> Good point. I'm holding off on amending this for now until we agree > >> on > >>>> an > >>>>>>> outcome for point 7 above, because I suspect it may at least > >> influence > >>>>>> the > >>>>>>> wording of this section. > >>>>>>> > >>>>>>> Thanks for the feedback, and sorry for the delay. Now that the > >> holidays > >>>>>> are > >>>>>>> over I'm able to focus on this again. > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Nick Telford > >>>>>>> > >>>>>>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <cado...@apache.org> > >>>> wrote: > >>>>>>> > >>>>>>>> Hi Nick, > >>>>>>>> > >>>>>>>> Thanks for the updates! > >>>>>>>> > >>>>>>>> The motivation section and the description of the new config is > much > >>>>>>>> clearer and informative now! > >>>>>>>> > >>>>>>>> You missed one "delete.interval.ms" in the last paragraph in > >> section > >>>>>>>> "Proposed Changes". > >>>>>>>> > >>>>>>>> I am afraid, I again need to comment on point 7. IMO, it does not > >> make > >>>>>>>> sense to be able to tune repartition.purge.interval.ms and > >>>>>>>> commit.interval.ms separately when the purge can only happen > >> during a > >>>>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and > >>>>>>>> repartition.purge.interval.ms to 35000 ms, the records will be > >> purged > >>>>>> at > >>>>>>>> every second commit, i.e., every 60000 ms. What benefit do users > >> have > >>>> to > >>>>>>>> set repartition.purge.interval.ms separately from > >> commit.interval.ms? > >>>>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 / > >>>>>>>> 2*commit.interval.ms.. > >>>>>>>> > >>>>>>>> Additionally, I have a new point. > >>>>>>>> 8. If user code has access to the processor context (e.g. in the > >>>>>>>> processor API), a commit can also be requested on demand by user > >> code. > >>>>>>>> The KIP should clarify if purges might also happen during > requested > >>>>>>>> commits or if purges only happen during automatic commits. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Bruno > >>>>>>>> > >>>>>>>> On 21.12.21 20:40, Nick Telford wrote: > >>>>>>>>> Hi everyone, > >>>>>>>>> > >>>>>>>>> Thanks for your feedback. I've made the suggested changes to the > >> KIP > >>>>>> (1, > >>>>>>>> 2, > >>>>>>>>> 3 and 5). > >>>>>>>>> > >>>>>>>>> For the new name, I've chosen repartition.purge.interval.ms, as > I > >>>> felt > >>>>>>>> it > >>>>>>>>> struck a good balance between being self-descriptive and concise. > >>>>>> Please > >>>>>>>>> let me know if you'd prefer something else. > >>>>>>>>> > >>>>>>>>> On point 6: My PR has basic validation to ensure the value is > >>>> positive, > >>>>>>>> but > >>>>>>>>> I don't think it's necessary to have dynamic validation, to > ensure > >>>> it's > >>>>>>>> not > >>>>>>>>> less than commit.interval.ms. The reason is that it will be > >>>> implicitly > >>>>>>>>> limited to that value anyway, and won't break anything. But I can > >> add > >>>>>> it > >>>>>>>> if > >>>>>>>>> you'd prefer it. > >>>>>>>>> > >>>>>>>>> On point 7: I worry that defaulting it to follow the value of > >>>>>>>>> commit.interval.ms may confuse users, who will likely expect the > >>>>>>>> default to > >>>>>>>>> not be affected by changes to other configuration options. I can > >> see > >>>>>> the > >>>>>>>>> appeal of retaining the existing behaviour (following the commit > >>>>>>>> interval) > >>>>>>>>> by default, but I believe that the majority of users who > customize > >>>>>>>>> commit.interval.ms do not desire a different frequency of > >>>> repartition > >>>>>>>>> record purging as well. > >>>>>>>>> > >>>>>>>>> As for multiples of commit interval: I think the argument against > >>>> that > >>>>>> is > >>>>>>>>> that an interval is more intuitive when given as a time, rather > >> than > >>>>>> as a > >>>>>>>>> multiple of some other operation. Users configuring this should > not > >>>>>> need > >>>>>>>> to > >>>>>>>>> break out a calculator to work out how frequently the records are > >>>> going > >>>>>>>> to > >>>>>>>>> be purged! > >>>>>>>>> > >>>>>>>>> I've also updated the PR with the relevant changes. > >>>>>>>>> > >>>>>>>>> BTW, for some reason I didn't receive Sophie's email. I'll > >>>> periodically > >>>>>>>>> check the thread on the archive to ensure I don't miss any more > of > >>>> your > >>>>>>>>> messages! > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> > >>>>>>>>> Nick > >>>>>>>>> > >>>>>>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <show...@gmail.com> > wrote: > >>>>>>>>> > >>>>>>>>>> Thanks, Bruno. > >>>>>>>>>> > >>>>>>>>>> I agree my point 4 is more like PR discussion, not KIP > discussion. > >>>>>>>>>> @Nick , please update my point 4 in PR directly. > >>>>>>>>>> > >>>>>>>>>> Thank you. > >>>>>>>>>> Luke > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna < > cado...@apache.org > >>> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Nick, > >>>>>>>>>>> > >>>>>>>>>>> Thank you for the KIP! > >>>>>>>>>>> > >>>>>>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I > >> agree > >>>>>>>> that > >>>>>>>>>>> we should update the docs for commit.interval.ms but I am not > >> sure > >>>>>> if > >>>>>>>>>>> this needs to mentioned in the KIP. That seems to me more a PR > >>>>>>>>>>> discussion. Also on point 2, I agree that we need to add a doc > >>>> string > >>>>>>>>>>> but the content should be exemplary not binding. What I want to > >> say > >>>>>> is > >>>>>>>>>>> that, we do not need a KIP to change docs. > >>>>>>>>>>> > >>>>>>>>>>> Here my points: > >>>>>>>>>>> > >>>>>>>>>>> 5. Could you specify in the motivation that the KIP is about > >>>> deleting > >>>>>>>>>>> records from repartition topics? Maybe with a short description > >>>> when > >>>>>>>> why > >>>>>>>>>>> and when records are deleted from the repartition topics. For > us > >> it > >>>>>>>>>>> might be clear, but IMO we should try to write KIPs so that > >> someone > >>>>>>>> that > >>>>>>>>>>> is relatively new to Kafka Streams can understand the KIP > without > >>>>>>>>>>> needing to know too much background. > >>>>>>>>>>> > >>>>>>>>>>> 6. Does the config need to be validated? For example, does > >>>>>>>>>>> delete.interval.ms need to be greater than or equal to > >>>>>>>>>> commit.interval.ms? > >>>>>>>>>>> > >>>>>>>>>>> 7. Should the default value for non-EOS be 30s or the same > value > >> as > >>>>>>>>>>> commit.interval.ms? I am just thinking about the case where a > >> user > >>>>>>>>>>> explicitly changes commit.interval.ms but not > delete.interval.ms > >>>> (or > >>>>>>>>>>> whatever name you come up for it). Once delete.interval.ms is > >> set > >>>>>>>>>>> explicitly it is decoupled from commit.interval.ms. Similar > >> could > >>>> be > >>>>>>>>>>> done for the EOS case. > >>>>>>>>>>> Alternatively, we could also define delete.interval.ms to > take a > >>>>>>>>>>> integral number without a unit that specifies after how many > >> commit > >>>>>>>>>>> intervals the records in repartition topics should be deleted. > >> This > >>>>>>>>>>> would make sense since delete.interval.ms is tightly bound to > >>>>>>>>>>> commit.interval.ms. Additionally, it would make the semantics > of > >>>> the > >>>>>>>>>>> config simpler. The name of the config should definitely change > >> if > >>>> we > >>>>>>>> go > >>>>>>>>>>> down this way. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Bruno > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 21.12.21 11:14, Luke Chen wrote: > >>>>>>>>>>>> Hi Nick, > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for the KIP! > >>>>>>>>>>>> > >>>>>>>>>>>> In addition to Sophie's comments, I have one more to this KIP: > >>>>>>>>>>>> 3. I think you should mention the behavior change *explicitly* > >> in > >>>>>>>>>>>> "Compatibility" section. I know you already mentioned it in > KIP, > >>>> in > >>>>>>>> the > >>>>>>>>>>>> benefit way. But I think in this section, we should clearly > >> point > >>>>>> out > >>>>>>>>>>> which > >>>>>>>>>>>> behavior will be change after this KIP. That is, you should > put > >> it > >>>>>>>>>> clear > >>>>>>>>>>>> that the delete record interval will change from 100ms to 30s > >> with > >>>>>> EOS > >>>>>>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html > >> doc. > >>>>>>>>>>>> 4. Since this new config has some relationship with > >>>>>>>> commit.interval.ms > >>>>>>>>>> , > >>>>>>>>>>> I > >>>>>>>>>>>> think we should also update the doc description for ` > >>>>>>>>>> commit.interval.ms > >>>>>>>>>>> `, > >>>>>>>>>>>> to let user know there's another config to control delete > >> interval > >>>>>> and > >>>>>>>>>>>> should be greater than commit.interval.ms. Something like > that. > >>>>>> WDYT? > >>>>>>>>>>> (You > >>>>>>>>>>>> should put this change in the KIP as Sophie mentioned) > >>>>>>>>>>>> > >>>>>>>>>>>> Thank you. > >>>>>>>>>>>> Luke > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman > >>>>>>>>>>>> <sop...@confluent.io.invalid> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hey Nick, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think you forgot to link to the KIP document, but I take it > >>>> this > >>>>>> is > >>>>>>>>>>>>> it: KIP-811: > >>>>>>>>>>>>> Add separate delete.interval.ms to Kafka Streams > >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw> > >>>>>>>>>>>>> > >>>>>>>>>>>>> The overall proposal sounds good to me, just a few minor > >> things: > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. Please specify everything needed to define this > >> config > >>>>>>>>>>> explicitly, ie > >>>>>>>>>>>>> all the arguments that will be passed in to the > >>>>>>>>>>>>> StreamsConfig's ConfigDef: in addition to the > default > >>>> value, > >>>>>> we > >>>>>>>>>>> need the > >>>>>>>>>>>>> config type (presumably a Long), the doc > >>>>>>>>>>>>> string, and the importance (probably "low", similar > to > >>>>>>>>>>>>> commit.interval.ms > >>>>>>>>>>>>> ) > >>>>>>>>>>>>> 2. Might be worth considering a slightly more > >> descriptive > >>>>>> name > >>>>>>>> for > >>>>>>>>>>> this > >>>>>>>>>>>>> config. Most users probably don't think about, > >>>>>>>>>>>>> or may not even be aware of, the deletion of > consumed > >>>>>> records by > >>>>>>>>>>> Kafka > >>>>>>>>>>>>> Streams, so calling it something along > >>>>>>>>>>>>> the lines of " > repartition.records.delete.interval.ms" > >> or > >>>> " > >>>>>>>>>>>>> consumed.records.deletion.interval.ms" or so on > will > >> help > >>>>>>>>>>>>> make it clear what the config refers to and whether > or > >> not > >>>>>> they > >>>>>>>>>>> need to > >>>>>>>>>>>>> care. Maybe you can come up with better > >>>>>>>>>>>>> and/or shorter names, just wanted to suggest some > >> example > >>>>>> names > >>>>>>>>>>> that I > >>>>>>>>>>>>> think sufficiently get the point across > >>>>>>>>>>>>> > >>>>>>>>>>>>> Other than that I'm +1 -- thanks for the KIP! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Sophie > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford < > >>>>>> nick.telf...@gmail.com > >>>>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549 > >>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The > >>>> solution > >>>>>>>> is > >>>>>>>>>>>>> very > >>>>>>>>>>>>>> simple, so the KIP is pretty short. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The suggested changes are implemented by this PR > >>>>>>>>>>>>>> <https://github.com/apache/kafka/pull/11610>. > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Nick Telford > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > >