Jun Do you have any final thoughts here? The vote thread has sufficient votes to accept but I wanted to ensure that we don’t have any pending items to discuss here.
— Divij Vaidya On Thu 28. Nov 2024 at 14:53, Divij Vaidya <divijvaidy...@gmail.com> wrote: > Agreed about the validation tool. I have already added it in the > "compatibility" section of the KIP. Will create a separate JIRA for it as > soon as KIP is approved. > > -- > Divij Vaidya > > > > On Thu, Nov 28, 2024 at 5:35 AM Satish Duggana <satish.dugg...@gmail.com> > wrote: > >> Hi Kamal, Thanks for updating the KIP with the offline discussion we >> had, especially on remotelog.manager.* thread pools with the new >> behaviour. >> >> Hi Divij, We should target having a validation tool in 4.0 so that >> users can run it before they run their upgrade and take necessary >> actions on it. I am fine with updating the KIP with those details >> later when PRs are raised. >> >> Thanks, >> Satish. >> >> On Thu, 28 Nov 2024 at 00:31, Divij Vaidya <divijvaidy...@gmail.com> >> wrote: >> > >> > Yes, Kamal, we will fix the tests that rely on low values of segment.ms >> as >> > part of the KIP implementation. >> > >> > -- >> > Divij Vaidya >> > >> > >> > >> > On Wed, Nov 27, 2024 at 7:40 PM Jun Rao <j...@confluent.io.invalid> >> wrote: >> > >> > > Hi, Kamal, >> > > >> > > Thanks for the explanation. This change sounds good to me then. >> > > >> > > Jun >> > > >> > > On Wed, Nov 27, 2024 at 10:31 AM Kamal Chandraprakash < >> > > kamal.chandraprak...@gmail.com> wrote: >> > > >> > > > Hi Jun, >> > > > >> > > > Thanks for the review! I've made the addendum to this KIP about the >> > > remote >> > > > storage configs: >> > > > >> > > > > 13. Why are the defaults for >> remote.log.manager.copier.thread.pool.size >> > > > and >> > > > remote.log.manager.thread.pool.size different? >> > > > >> > > > We have 3 thread pools in remote log manager: >> > > > >> > > > a. remote.log.manager.thread.pool.size: handles the RLMFollowerTask >> > > > < >> > > > >> > > >> https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1454 >> > > > > >> > > > to read the highest-uploaded remote offset for follower partitions >> > > > b. remote.log.manager.copier.thread.pool.size: handles the >> RLMCopyTask >> > > > < >> > > > >> > > >> https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L830 >> > > > > >> > > > to copy the segments to remote storage >> > > > c. remote.log.manager.expiration.thread.pool.size: handles the >> > > > RLMExpirationTask >> > > > < >> > > > >> > > >> https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1095 >> > > > > >> > > > to delete the expired remote segments. >> > > > >> > > > The plan was to deprecate the remote.log.manager.thread.pool.size >> > > > thread-pool after KIP-950 but it was not done >> > > > and used for the follower partition tasks. Compared to >> copier/expiration >> > > > tasks, the follower tasks are light-weight >> > > > so proposing to reduce that thread-pool size to 2. >> > > > >> > > > -- >> > > > Kamal >> > > > >> > > > On Wed, Nov 27, 2024 at 11:48 PM Kamal Chandraprakash < >> > > > kamal.chandraprak...@gmail.com> wrote: >> > > > >> > > > > Hi Divij, >> > > > > >> > > > > I also share the same concern as Greg pointed out for >> segment.bytes / >> > > > > segment.index.bytes. >> > > > > All the tiered storage tests >> > > > > < >> > > > >> > > >> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/test/java/org/apache/kafka/tiered/storage/utils/TieredStorageTestUtils.java?L174 >> > > > > >> > > > > might start to fail with the new defaults. I'm fine with the >> proposal >> > > > > provided we plan to fix those tests as part of this KIP. >> > > > > >> > > > > -- >> > > > > Kamal >> > > > > >> > > > > >> > > > > On Wed, Nov 27, 2024 at 9:55 PM Divij Vaidya < >> divijvaidy...@gmail.com> >> > > > > wrote: >> > > > > >> > > > >> Hello everyone >> > > > >> >> > > > >> Since, I believe I have addressed all the concerns that were >> raised >> > > > here, >> > > > >> I >> > > > >> have started a vote thread for this KIP at >> > > > >> https://lists.apache.org/thread/39dmfkd7ktb0oo44yqrkndcn4kcqt5hc >> > > > >> >> > > > >> Please participate in the vote. >> > > > >> >> > > > >> -- >> > > > >> Divij Vaidya >> > > > >> >> > > > >> >> > > > >> >> > > > >> On Wed, Nov 27, 2024 at 12:33 PM Divij Vaidya < >> > > divijvaidy...@gmail.com> >> > > > >> wrote: >> > > > >> >> > > > >> > Thanks for the feedback folks. >> > > > >> > >> > > > >> > *Jun* >> > > > >> > >> > > > >> > 10. Fixed it. It now says 2 as the new default for recovery >> threads. >> > > > >> > >> > > > >> > 11. I have added a sentence that we will apply the defaults >> for both >> > > > >> > broker level and equivalent topic level configs. I have further >> > > added >> > > > >> both >> > > > >> > the broker level and the topic level config to the table. For >> > > example, >> > > > >> you >> > > > >> > may notice (message.timestamp.after.max.ms / >> > > > >> > log.message.timestamp.after.max.ms). Furthermore, the >> constraints >> > > > will >> > > > >> > apply (similar to constraints today) when validating >> dynamically >> > > > changed >> > > > >> > configuration and also when validating static configuration >> (such as >> > > > >> > server.properties). Please let me know if I have missed >> anything. >> > > > >> > >> > > > >> > 12. In the interest of time, I have removed the constraint >> proposal >> > > > for >> > > > >> rf >> > > > >> > >= min.insync.replicas. We will circle back on it in a >> separate KIP. >> > > > >> > >> > > > >> > *Luke* >> > > > >> > >> > > > >> > 10 and 12 above should align with what you suggested. >> > > > >> > >> > > > >> > -- >> > > > >> > Divij Vaidya >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > On Wed, Nov 27, 2024 at 4:00 AM Luke Chen <show...@gmail.com> >> > > wrote: >> > > > >> > >> > > > >> >> Hi Divij and Jun, >> > > > >> >> >> > > > >> >> Thanks for your comments. >> > > > >> >> I'm good we put the default value of >> > > > num.recovery.threads.per.data.dir >> > > > >> to >> > > > >> >> 2 >> > > > >> >> since there are many factors that need to be considered. >> > > > >> >> >> > > > >> >> And James, good point of min.insync.replicas validation. If >> it's >> > > > >> >> complicated or will confuse users, I'd propose we leave it >> out of >> > > > >> v4.0.0. >> > > > >> >> >> > > > >> >> Thanks. >> > > > >> >> Luke >> > > > >> >> >> > > > >> >> On Tue, Nov 26, 2024 at 2:31 AM Jun Rao >> <j...@confluent.io.invalid> >> > > > >> wrote: >> > > > >> >> >> > > > >> >> > Hi, Divij, >> > > > >> >> > >> > > > >> >> > Thanks for the reply. A few more comments. >> > > > >> >> > >> > > > >> >> > 10. num.recovery.threads.per.data.dir still seems to depend >> on >> > > the >> > > > >> >> number >> > > > >> >> > of cores. >> > > > >> >> > >> > > > >> >> > 11. Some of the configs on the server side exist at >> different >> > > > levels >> > > > >> >> > (static, broker, topic, etc) with slightly different names. >> It >> > > > would >> > > > >> be >> > > > >> >> > useful to be clear at what level the new default and the >> > > constraint >> > > > >> >> apply. >> > > > >> >> > >> > > > >> >> > 12. James had a good point on min.insync.replicas. It would >> be >> > > > >> useful to >> > > > >> >> > define when the constraint applies (topic creation, config >> > > changes, >> > > > >> >> etc). >> > > > >> >> > For example, if the broker-level min.insync.replicas value >> is >> > > > changed >> > > > >> >> to 2, >> > > > >> >> > what happens to existing topics with replication factor 1? >> If a >> > > > topic >> > > > >> >> has >> > > > >> >> > min.insync.replicas of 2, what happens to an >> > > > >> AlterPartitionReassignments >> > > > >> >> > request that wants to reduce the replication factor to 1? >> > > > >> >> > >> > > > >> >> > Thanks, >> > > > >> >> > >> > > > >> >> > Jun >> > > > >> >> > >> > > > >> >> > On Sat, Nov 23, 2024 at 9:57 AM Divij Vaidya < >> > > > >> divijvaidy...@gmail.com> >> > > > >> >> > wrote: >> > > > >> >> > >> > > > >> >> > > Jun >> > > > >> >> > > >> > > > >> >> > > Thank you for the feedback. I have removed the >> configuration >> > > > >> changes >> > > > >> >> > where >> > > > >> >> > > we are relying on num cores. The only change I have kept >> is >> > > > >> increasing >> > > > >> >> > > recovery threads to 2 (from 1 as default). >> > > > >> >> > > >> > > > >> >> > > James >> > > > >> >> > > >> > > > >> >> > > Thank you for bringing the JIRA to my attention. I haven't >> > > looked >> > > > >> >> deeply >> > > > >> >> > > into the implementation but based on my understanding of >> the >> > > > Kafka >> > > > >> >> code >> > > > >> >> > > base, I do believe that there is a path to implement this >> > > > >> constraint. >> > > > >> >> We >> > > > >> >> > > will cross that bridge during the implementation phase >> and I >> > > will >> > > > >> >> ensure >> > > > >> >> > > that I look at the historical context you provided in the >> JIRA. >> > > > >> >> > > >> > > > >> >> > > -- >> > > > >> >> > > Divij Vaidya >> > > > >> >> > > >> > > > >> >> > > >> > > > >> >> > > >> > > > >> >> > > On Sat, Nov 23, 2024 at 6:48 AM James Cheng < >> > > > wushuja...@gmail.com> >> > > > >> >> > wrote: >> > > > >> >> > > >> > > > >> >> > > > About replication.factor >= min.insync.replicas change, >> you >> > > > >> should >> > > > >> >> look >> > > > >> >> > > at >> > > > >> >> > > > https://issues.apache.org/jira/browse/KAFKA-4680 . >> That JIRA >> > > > >> talks >> > > > >> >> > about >> > > > >> >> > > > some of the complexities of detecting it. For example, >> what >> > > if >> > > > a >> > > > >> >> topic >> > > > >> >> > > has >> > > > >> >> > > > replication factor 1, but someone changes the >> broker-level >> > > > >> >> > > > min.insync.replicas value to 2? How would that be >> detected? >> > > > >> >> > > > >> > > > >> >> > > > That JIRA has an associated PR. The PR has some >> comments that >> > > > >> link >> > > > >> >> to >> > > > >> >> > > > discussions on this mailing list. >> > > > >> >> > > > >> > > > >> >> > > > That PR, btw, was just closed due to being stale. >> > > > >> >> > > > >> > > > >> >> > > > -James >> > > > >> >> > > > >> > > > >> >> > > > Sent from my iPhone >> > > > >> >> > > > >> > > > >> >> > > > > On Nov 18, 2024, at 2:15 AM, Divij Vaidya < >> > > > >> >> divijvaidy...@gmail.com> >> > > > >> >> > > > wrote: >> > > > >> >> > > > > >> > > > >> >> > > > > Hey folks >> > > > >> >> > > > > >> > > > >> >> > > > > With 4.0, we have an opportunity to reset the default >> > > values >> > > > >> and >> > > > >> >> add >> > > > >> >> > > > > constraints in the configurations based on our >> learnings >> > > > since >> > > > >> >> 3.0. >> > > > >> >> > > > > >> > > > >> >> > > > > Here's a KIP which modifies defaults for some >> properties >> > > and >> > > > >> >> modifies >> > > > >> >> > > the >> > > > >> >> > > > > constraints for a few others. >> > > > >> >> > > > > >> > > > >> >> > > > > >> > > > >> >> > > > >> > > > >> >> > > >> > > > >> >> > >> > > > >> >> >> > > > >> >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations >> > > > >> >> > > > > >> > > > >> >> > > > > >> > > > >> >> > > > > Looking forward for your feedback. >> > > > >> >> > > > > >> > > > >> >> > > > > (Previous discussion thread on this topic - >> > > > >> >> > > > > >> > > > >> https://lists.apache.org/thread/3dx9mdmsqf8pko9xdmhks80k96g650zp >> > > > >> >> ) >> > > > >> >> > > > >> > > > >> >> > > >> > > > >> >> > >> > > > >> >> >> > > > >> > >> > > > >> >> > > > > >> > > > >> > > >> >