Hi Jun, Thanks for the review! I've made the addendum to this KIP about the remote storage configs:
> 13. Why are the defaults for remote.log.manager.copier.thread.pool.size and remote.log.manager.thread.pool.size different? We have 3 thread pools in remote log manager: a. remote.log.manager.thread.pool.size: handles the RLMFollowerTask <https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1454> to read the highest-uploaded remote offset for follower partitions b. remote.log.manager.copier.thread.pool.size: handles the RLMCopyTask <https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L830> to copy the segments to remote storage c. remote.log.manager.expiration.thread.pool.size: handles the RLMExpirationTask <https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1095> to delete the expired remote segments. The plan was to deprecate the remote.log.manager.thread.pool.size thread-pool after KIP-950 but it was not done and used for the follower partition tasks. Compared to copier/expiration tasks, the follower tasks are light-weight so proposing to reduce that thread-pool size to 2. -- Kamal On Wed, Nov 27, 2024 at 11:48 PM Kamal Chandraprakash < kamal.chandraprak...@gmail.com> wrote: > Hi Divij, > > I also share the same concern as Greg pointed out for segment.bytes / > segment.index.bytes. > All the tiered storage tests > <https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/test/java/org/apache/kafka/tiered/storage/utils/TieredStorageTestUtils.java?L174> > might start to fail with the new defaults. I'm fine with the proposal > provided we plan to fix those tests as part of this KIP. > > -- > Kamal > > > On Wed, Nov 27, 2024 at 9:55 PM Divij Vaidya <divijvaidy...@gmail.com> > wrote: > >> Hello everyone >> >> Since, I believe I have addressed all the concerns that were raised here, >> I >> have started a vote thread for this KIP at >> https://lists.apache.org/thread/39dmfkd7ktb0oo44yqrkndcn4kcqt5hc >> >> Please participate in the vote. >> >> -- >> Divij Vaidya >> >> >> >> On Wed, Nov 27, 2024 at 12:33 PM Divij Vaidya <divijvaidy...@gmail.com> >> wrote: >> >> > Thanks for the feedback folks. >> > >> > *Jun* >> > >> > 10. Fixed it. It now says 2 as the new default for recovery threads. >> > >> > 11. I have added a sentence that we will apply the defaults for both >> > broker level and equivalent topic level configs. I have further added >> both >> > the broker level and the topic level config to the table. For example, >> you >> > may notice (message.timestamp.after.max.ms / >> > log.message.timestamp.after.max.ms). Furthermore, the constraints will >> > apply (similar to constraints today) when validating dynamically changed >> > configuration and also when validating static configuration (such as >> > server.properties). Please let me know if I have missed anything. >> > >> > 12. In the interest of time, I have removed the constraint proposal for >> rf >> > >= min.insync.replicas. We will circle back on it in a separate KIP. >> > >> > *Luke* >> > >> > 10 and 12 above should align with what you suggested. >> > >> > -- >> > Divij Vaidya >> > >> > >> > >> > On Wed, Nov 27, 2024 at 4:00 AM Luke Chen <show...@gmail.com> wrote: >> > >> >> Hi Divij and Jun, >> >> >> >> Thanks for your comments. >> >> I'm good we put the default value of num.recovery.threads.per.data.dir >> to >> >> 2 >> >> since there are many factors that need to be considered. >> >> >> >> And James, good point of min.insync.replicas validation. If it's >> >> complicated or will confuse users, I'd propose we leave it out of >> v4.0.0. >> >> >> >> Thanks. >> >> Luke >> >> >> >> On Tue, Nov 26, 2024 at 2:31 AM Jun Rao <j...@confluent.io.invalid> >> wrote: >> >> >> >> > Hi, Divij, >> >> > >> >> > Thanks for the reply. A few more comments. >> >> > >> >> > 10. num.recovery.threads.per.data.dir still seems to depend on the >> >> number >> >> > of cores. >> >> > >> >> > 11. Some of the configs on the server side exist at different levels >> >> > (static, broker, topic, etc) with slightly different names. It would >> be >> >> > useful to be clear at what level the new default and the constraint >> >> apply. >> >> > >> >> > 12. James had a good point on min.insync.replicas. It would be >> useful to >> >> > define when the constraint applies (topic creation, config changes, >> >> etc). >> >> > For example, if the broker-level min.insync.replicas value is changed >> >> to 2, >> >> > what happens to existing topics with replication factor 1? If a topic >> >> has >> >> > min.insync.replicas of 2, what happens to an >> AlterPartitionReassignments >> >> > request that wants to reduce the replication factor to 1? >> >> > >> >> > Thanks, >> >> > >> >> > Jun >> >> > >> >> > On Sat, Nov 23, 2024 at 9:57 AM Divij Vaidya < >> divijvaidy...@gmail.com> >> >> > wrote: >> >> > >> >> > > Jun >> >> > > >> >> > > Thank you for the feedback. I have removed the configuration >> changes >> >> > where >> >> > > we are relying on num cores. The only change I have kept is >> increasing >> >> > > recovery threads to 2 (from 1 as default). >> >> > > >> >> > > James >> >> > > >> >> > > Thank you for bringing the JIRA to my attention. I haven't looked >> >> deeply >> >> > > into the implementation but based on my understanding of the Kafka >> >> code >> >> > > base, I do believe that there is a path to implement this >> constraint. >> >> We >> >> > > will cross that bridge during the implementation phase and I will >> >> ensure >> >> > > that I look at the historical context you provided in the JIRA. >> >> > > >> >> > > -- >> >> > > Divij Vaidya >> >> > > >> >> > > >> >> > > >> >> > > On Sat, Nov 23, 2024 at 6:48 AM James Cheng <wushuja...@gmail.com> >> >> > wrote: >> >> > > >> >> > > > About replication.factor >= min.insync.replicas change, you >> should >> >> look >> >> > > at >> >> > > > https://issues.apache.org/jira/browse/KAFKA-4680 . That JIRA >> talks >> >> > about >> >> > > > some of the complexities of detecting it. For example, what if a >> >> topic >> >> > > has >> >> > > > replication factor 1, but someone changes the broker-level >> >> > > > min.insync.replicas value to 2? How would that be detected? >> >> > > > >> >> > > > That JIRA has an associated PR. The PR has some comments that >> link >> >> to >> >> > > > discussions on this mailing list. >> >> > > > >> >> > > > That PR, btw, was just closed due to being stale. >> >> > > > >> >> > > > -James >> >> > > > >> >> > > > Sent from my iPhone >> >> > > > >> >> > > > > On Nov 18, 2024, at 2:15 AM, Divij Vaidya < >> >> divijvaidy...@gmail.com> >> >> > > > wrote: >> >> > > > > >> >> > > > > Hey folks >> >> > > > > >> >> > > > > With 4.0, we have an opportunity to reset the default values >> and >> >> add >> >> > > > > constraints in the configurations based on our learnings since >> >> 3.0. >> >> > > > > >> >> > > > > Here's a KIP which modifies defaults for some properties and >> >> modifies >> >> > > the >> >> > > > > constraints for a few others. >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations >> >> > > > > >> >> > > > > >> >> > > > > Looking forward for your feedback. >> >> > > > > >> >> > > > > (Previous discussion thread on this topic - >> >> > > > > >> https://lists.apache.org/thread/3dx9mdmsqf8pko9xdmhks80k96g650zp >> >> ) >> >> > > > >> >> > > >> >> > >> >> >> > >> >