Hi Bruno, Thanks for your recommendations.
I've made those changes, and also updated the description for the new config to read: "The minimum frequency in milliseconds with which to delete fully consumed records from repartition topics. *Purging will occur after at least this value since the last purge, but may be delayed until later in order to meet the processing guarantee.* The default value is the same as the default for commit.interval.ms (30000). (Note, unlike commit.interval.ms, the default for this value remains unchanged when processing.guarantee is set to exactly_once_v2)." This should make it clear that this is just a minimum interval, without leaking too much detail in to the specification. If there are no other issues, I'll put this to a vote. Regards, Nick Telford On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <cado...@apache.org> wrote: > Hi Nick, > > Sorry for the delay! > > Regarding point 7, I am fine with keeping the config as an interval and > keeping it independently of the commit interval. However, I would then > remove the following paragraph from the KIP: > > "We will still wait for a commit before explicitly deleting repartition > records, but we will only do so if the time since the last record > deletion is at least repartition.purge.interval.ms. This means the > lower-bound for repartition.purge.interval.ms is effectively capped by > the value of commit.interval.ms." > > The reason is that in the previous paragraph you say that the configs > can be modified separately and then in this paragraph you bind the purge > interval to the commit interval. This seems a contradiction and > indicates that you are leaking too much implementation details into the > KIP. I think, just saying that the purge interval is a minimum and name > it accordingly without talking about the actual implementation makes the > KIP more robust against future implementation changes. > > My proposal for the config name is min.repartition.purge.interval.ms or > even min.purge.interval.ms with a preference for the former. > > Best, > Bruno > > > > On 04.01.22 17:21, Nick Telford wrote: > >> > >> You missed one "delete.interval.ms" in the last paragraph in section > >> "Proposed Changes". > >> > > > > I've fixed these now, including the title that I somehow missed! > > > > I am afraid, I again need to comment on point 7. IMO, it does not make > > > > sense to be able to tune repartition.purge.interval.ms and > > > > commit.interval.ms separately when the purge can only happen during a > > > > commit. For example, if I set commit.interval.ms to 30000 ms and > > > > repartition.purge.interval.ms to 35000 ms, the records will be purged at > > > > every second commit, i.e., every 60000 ms. What benefit do users have to > > > > set repartition.purge.interval.ms separately from commit.interval.ms? > > > > The rate of purging will never be 1 / 35000, the rate will be 1 / > > > > 2*commit.interval.ms.. > > > > > > Could we address this by chaning the name of the configuration to > something > > like "repartition.purge.min.interval.ms", to indicate that the > repartition > > purge interval will be *at least* this value? > > > > If that's still not suitable, are there any other existing configurations > > that behave in a similar way, i.e. dictate a multiple of another > interval, > > that we could use as a basis for a new name for this configuration? > > > > Additionally, I have a new point. > > > > 8. If user code has access to the processor context (e.g. in the > > > > processor API), a commit can also be requested on demand by user code. > > > > The KIP should clarify if purges might also happen during requested > > > > commits or if purges only happen during automatic commits. > > > > > > Good point. I'm holding off on amending this for now until we agree on an > > outcome for point 7 above, because I suspect it may at least influence > the > > wording of this section. > > > > Thanks for the feedback, and sorry for the delay. Now that the holidays > are > > over I'm able to focus on this again. > > > > Regards, > > > > Nick Telford > > > > On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <cado...@apache.org> wrote: > > > >> Hi Nick, > >> > >> Thanks for the updates! > >> > >> The motivation section and the description of the new config is much > >> clearer and informative now! > >> > >> You missed one "delete.interval.ms" in the last paragraph in section > >> "Proposed Changes". > >> > >> I am afraid, I again need to comment on point 7. IMO, it does not make > >> sense to be able to tune repartition.purge.interval.ms and > >> commit.interval.ms separately when the purge can only happen during a > >> commit. For example, if I set commit.interval.ms to 30000 ms and > >> repartition.purge.interval.ms to 35000 ms, the records will be purged > at > >> every second commit, i.e., every 60000 ms. What benefit do users have to > >> set repartition.purge.interval.ms separately from commit.interval.ms? > >> The rate of purging will never be 1 / 35000, the rate will be 1 / > >> 2*commit.interval.ms.. > >> > >> Additionally, I have a new point. > >> 8. If user code has access to the processor context (e.g. in the > >> processor API), a commit can also be requested on demand by user code. > >> The KIP should clarify if purges might also happen during requested > >> commits or if purges only happen during automatic commits. > >> > >> Best, > >> Bruno > >> > >> On 21.12.21 20:40, Nick Telford wrote: > >>> Hi everyone, > >>> > >>> Thanks for your feedback. I've made the suggested changes to the KIP > (1, > >> 2, > >>> 3 and 5). > >>> > >>> For the new name, I've chosen repartition.purge.interval.ms, as I felt > >> it > >>> struck a good balance between being self-descriptive and concise. > Please > >>> let me know if you'd prefer something else. > >>> > >>> On point 6: My PR has basic validation to ensure the value is positive, > >> but > >>> I don't think it's necessary to have dynamic validation, to ensure it's > >> not > >>> less than commit.interval.ms. The reason is that it will be implicitly > >>> limited to that value anyway, and won't break anything. But I can add > it > >> if > >>> you'd prefer it. > >>> > >>> On point 7: I worry that defaulting it to follow the value of > >>> commit.interval.ms may confuse users, who will likely expect the > >> default to > >>> not be affected by changes to other configuration options. I can see > the > >>> appeal of retaining the existing behaviour (following the commit > >> interval) > >>> by default, but I believe that the majority of users who customize > >>> commit.interval.ms do not desire a different frequency of repartition > >>> record purging as well. > >>> > >>> As for multiples of commit interval: I think the argument against that > is > >>> that an interval is more intuitive when given as a time, rather than > as a > >>> multiple of some other operation. Users configuring this should not > need > >> to > >>> break out a calculator to work out how frequently the records are going > >> to > >>> be purged! > >>> > >>> I've also updated the PR with the relevant changes. > >>> > >>> BTW, for some reason I didn't receive Sophie's email. I'll periodically > >>> check the thread on the archive to ensure I don't miss any more of your > >>> messages! > >>> > >>> Regards, > >>> > >>> Nick > >>> > >>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <show...@gmail.com> wrote: > >>> > >>>> Thanks, Bruno. > >>>> > >>>> I agree my point 4 is more like PR discussion, not KIP discussion. > >>>> @Nick , please update my point 4 in PR directly. > >>>> > >>>> Thank you. > >>>> Luke > >>>> > >>>> > >>>> > >>>> > >>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <cado...@apache.org> > >> wrote: > >>>> > >>>>> Hi Nick, > >>>>> > >>>>> Thank you for the KIP! > >>>>> > >>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree > >> that > >>>>> we should update the docs for commit.interval.ms but I am not sure > if > >>>>> this needs to mentioned in the KIP. That seems to me more a PR > >>>>> discussion. Also on point 2, I agree that we need to add a doc string > >>>>> but the content should be exemplary not binding. What I want to say > is > >>>>> that, we do not need a KIP to change docs. > >>>>> > >>>>> Here my points: > >>>>> > >>>>> 5. Could you specify in the motivation that the KIP is about deleting > >>>>> records from repartition topics? Maybe with a short description when > >> why > >>>>> and when records are deleted from the repartition topics. For us it > >>>>> might be clear, but IMO we should try to write KIPs so that someone > >> that > >>>>> is relatively new to Kafka Streams can understand the KIP without > >>>>> needing to know too much background. > >>>>> > >>>>> 6. Does the config need to be validated? For example, does > >>>>> delete.interval.ms need to be greater than or equal to > >>>> commit.interval.ms? > >>>>> > >>>>> 7. Should the default value for non-EOS be 30s or the same value as > >>>>> commit.interval.ms? I am just thinking about the case where a user > >>>>> explicitly changes commit.interval.ms but not delete.interval.ms (or > >>>>> whatever name you come up for it). Once delete.interval.ms is set > >>>>> explicitly it is decoupled from commit.interval.ms. Similar could be > >>>>> done for the EOS case. > >>>>> Alternatively, we could also define delete.interval.ms to take a > >>>>> integral number without a unit that specifies after how many commit > >>>>> intervals the records in repartition topics should be deleted. This > >>>>> would make sense since delete.interval.ms is tightly bound to > >>>>> commit.interval.ms. Additionally, it would make the semantics of the > >>>>> config simpler. The name of the config should definitely change if we > >> go > >>>>> down this way. > >>>>> > >>>>> Best, > >>>>> Bruno > >>>>> > >>>>> > >>>>> > >>>>> On 21.12.21 11:14, Luke Chen wrote: > >>>>>> Hi Nick, > >>>>>> > >>>>>> Thanks for the KIP! > >>>>>> > >>>>>> In addition to Sophie's comments, I have one more to this KIP: > >>>>>> 3. I think you should mention the behavior change *explicitly* in > >>>>>> "Compatibility" section. I know you already mentioned it in KIP, in > >> the > >>>>>> benefit way. But I think in this section, we should clearly point > out > >>>>> which > >>>>>> behavior will be change after this KIP. That is, you should put it > >>>> clear > >>>>>> that the delete record interval will change from 100ms to 30s with > EOS > >>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc. > >>>>>> 4. Since this new config has some relationship with > >> commit.interval.ms > >>>> , > >>>>> I > >>>>>> think we should also update the doc description for ` > >>>> commit.interval.ms > >>>>> `, > >>>>>> to let user know there's another config to control delete interval > and > >>>>>> should be greater than commit.interval.ms. Something like that. > WDYT? > >>>>> (You > >>>>>> should put this change in the KIP as Sophie mentioned) > >>>>>> > >>>>>> Thank you. > >>>>>> Luke > >>>>>> > >>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman > >>>>>> <sop...@confluent.io.invalid> wrote: > >>>>>> > >>>>>>> Hey Nick, > >>>>>>> > >>>>>>> I think you forgot to link to the KIP document, but I take it this > is > >>>>>>> it: KIP-811: > >>>>>>> Add separate delete.interval.ms to Kafka Streams > >>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw> > >>>>>>> > >>>>>>> The overall proposal sounds good to me, just a few minor things: > >>>>>>> > >>>>>>> 1. Please specify everything needed to define this config > >>>>> explicitly, ie > >>>>>>> all the arguments that will be passed in to the > >>>>>>> StreamsConfig's ConfigDef: in addition to the default value, > we > >>>>> need the > >>>>>>> config type (presumably a Long), the doc > >>>>>>> string, and the importance (probably "low", similar to > >>>>>>> commit.interval.ms > >>>>>>> ) > >>>>>>> 2. Might be worth considering a slightly more descriptive > name > >> for > >>>>> this > >>>>>>> config. Most users probably don't think about, > >>>>>>> or may not even be aware of, the deletion of consumed > records by > >>>>> Kafka > >>>>>>> Streams, so calling it something along > >>>>>>> the lines of "repartition.records.delete.interval.ms" or " > >>>>>>> consumed.records.deletion.interval.ms" or so on will help > >>>>>>> make it clear what the config refers to and whether or not > they > >>>>> need to > >>>>>>> care. Maybe you can come up with better > >>>>>>> and/or shorter names, just wanted to suggest some example > names > >>>>> that I > >>>>>>> think sufficiently get the point across > >>>>>>> > >>>>>>> Other than that I'm +1 -- thanks for the KIP! > >>>>>>> > >>>>>>> Sophie > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford < > nick.telf...@gmail.com > >>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> This is a KIP for a proposed solution to KAFKA-13549 > >>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution > >> is > >>>>>>> very > >>>>>>>> simple, so the KIP is pretty short. > >>>>>>>> > >>>>>>>> The suggested changes are implemented by this PR > >>>>>>>> <https://github.com/apache/kafka/pull/11610>. > >>>>>>>> -- > >>>>>>>> Nick Telford > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > >