Jun 1. Good catch. I was hasty in deriving the formula. Fixed it in the proposal below.
2. Isn't the proposed default better than the default of 1 today (for recovery threads)? For a quad core machine with a single directory, our current proposal will have 4 threads recovering instead of 1. Ideally, as you observed, this should be more since recovery is IO bound. I have made a new proposal in the formula below. 3. Agreed. I think that is a risk which should be acceptable to us because the default is meant to help "non-power users" who would want a good out-of-the-box experience without having to tweak configuration. This category of users may not care about how many threads are configured for a thread pool. (of course, they can also calculate it based on documentation or we can choose to add a metric/log if required in future). 4. Agreed. Perhaps I can remove the dependency on variable components such as dir/listener and instead focus on fixed components such as number of cores. I have updated the proposal below. Would the following formula address your concerns? (Also, open to any other suggestion that will address our initial problem of users not sizing the thread pools based on machine size) io threads = 2 x number of cores network threads = number of cores recovery threads = 2 x number of cores If you have a strong opinion on this, I am also willing to remove these config changes from this KIP. These are not as important as the rest of them. -- Divij Vaidya On Thu, Nov 21, 2024 at 7:42 AM Kamal Chandraprakash < kamal.chandraprak...@gmail.com> wrote: > Hi Divij, > > I've updated the KIP to change the default value for configs #11 and #12. > > On Thu, Nov 21, 2024 at 6:25 AM Jun Rao <j...@confluent.io.invalid> wrote: > > > Hi, Divij, > > > > Thanks for the updated KIP. > > > > Regarding > > num.recovery.threads.per.data.dir/num.network.threads/num.io.threads, I > > have a few concerns. > > 1. It seems that we can't use floor since this can result in a value of > 0, > > which is invalid. > > 2. I am not sure that the new default value is actually better. For > > example, if the recovery is I/O bound, we actually want to have more > > threads than the number of cores. > > 3. It's now hard for a user to know the actual default value. > > 4. It may also be a surprise that the default value changes when a user > > adds a listener or a dir. > > > > Thanks, > > > > Jun > > > > On Wed, Nov 20, 2024 at 7:36 AM Kamal Chandraprakash < > > kamal.chandraprak...@gmail.com> wrote: > > > > > Hi Divij, > > > > > > We would like to change the default value for the below remote log > > > configurations: > > > > > > 1. remote.log.manager.copier.thread.pool.size -- default value from -1 > to > > > 10 > > > 2. remote.log.manager.expiration.thread.pool.size -- default value from > > -1 > > > to 10 > > > > > > In KIP-950 > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-ZookeeperBackedCluster > > > >, > > > it was proposed to split the thread pool that was used by leader and > > > follower into copy and expiration thread pools. With KIP-956 > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas > > > >, > > > we did this change in v3.9. > > > The default value for the copier and expiration thread pools was set to > > -1 > > > to derive the > > > value from the previous thread pool for backward compatibility. See > > > PR#17499 > > > <https://github.com/apache/kafka/pull/17499/#discussion_r1804119567>. > > > > > > Now, we have three thread pools: > > > > > > 1. remote.log.manager.thread.pool.size -- this is used by replica to > > fetch > > > the highest-uploaded remote offset for the follower partitions and > won't > > be > > > deprecated as proposed. > > > 2. remote.log.manager.copier.thread.pool.size -- this is used by leader > > to > > > upload the log segments to remote. > > > 3. remote.log.manager.expiration.thread.pool.size -- this is used by > > leader > > > to delete the expired log segments from remote. > > > > > > A broker acts as both the leader for some partitions and as follower > for > > > other partitions, > > > deriving the value from `remote.log.manager.thread.pool.size` for > copier > > > and expiration > > > thread pool looks incorrect to me. The task of the follower is > > light-weight > > > to fetch only > > > the highest-uploaded remote offset whereas the task of the leader is > > heavy > > > to copy and > > > delete the segments. Moreover, the change didn't ensure backward > > > compatibility, if the > > > user was running with 10 threads in v3.8. In v3.9, the server will > start > > 30 > > > threads. > > > > > > So, I propose to change the default value of copier and expiration > thread > > > pools from -1 to 10. > > > This will also ensure smooth validation while making those configs as > > > dynamic. > > > > > > Thanks, > > > Kamal > > > > > > On Wed, Nov 20, 2024 at 5:41 PM Divij Vaidya <divijvaidy...@gmail.com> > > > wrote: > > > > > > > Thank you everyone for your feedback. > > > > > > > > The new revision of KIP is ready for review. Notable changes since > last > > > > revision: > > > > 1. Removed change to log.message.timestamp.before.max.ms > > > > 2. Modified default for num.recovery.threads.per.data.dir > > > > 3. Added new proposals for num.io.threads and num.network.threads > > > > 4. Improved justification for changes and made motivation more > explicit > > > > > > > > > > > > *Federico* > > > > > > > > Good point. I have updated the compatibility section to include > > > information > > > > on how the new values will be handled during the rolling upgrade to > > 4.0. > > > > > > > > *Jun* > > > > > > > > Thank you for pointing out the use case. It correlated with my > > experience > > > > where most of the problems arise from users accidentally setting a > > > > timestamp in ns instead of ms, and hence, leading to a future time. I > > > have > > > > reverted changes to log.message.timestamp.after.max.ms > > > > > > > > *Tommi* > > > > > > > > You have a valid point about the friction this could cause during > > > upgrade. > > > > This is a risk we are accepting with this KIP. The risk is acceptable > > > > because we expect due diligence from the users while transitioning > to a > > > new > > > > major version and also because, there aren't legitimate use cases to > > set > > > > these values to a threshold where they will cause system instability. > > > > > > > > I have added the failure mode for invalid values in the compatibility > > > > section. > > > > > > > > About your suggestion to apply new changes but not fail upgrade, I > am a > > > bit > > > > wary about silently modifying configuration that user has explicitly > > set. > > > > The strategy also won't work for cases where the user uses a 3P tool > to > > > > reset configurations. Failing fast on invalid configuration is the > best > > > > option. To improve user experience I have added a suggestion in the > KIP > > > to > > > > add a "upgrade validation tool" which a user can run to verify > > > > compatibility before actual upgrade. It won't be part of this KIP but > > > > something we might want to do to ease migration to 4.0. I will float > > the > > > > idea around in the community and see if we have volunteers for this. > > > > > > > > *Greg* > > > > > > > > Thanks for raising this concern. We indeed intentionally use small > > values > > > > to "force" a segment roll sometimes. However, I believe that there > are > > > > alternate ways to accomplish that in a test environment. As an > example, > > > if > > > > I wanted to create small segments for a test scenario, I would either > > > write > > > > an integration test where I can call roll explicitly OR send a > message > > > with > > > > create time in future (roll is based on timestamp of last appended > > > message) > > > > OR just write more data (1MB minimum today). > > > > > > > > But to your larger point, I agree that there is value in providing > two > > > > different guardrails for the users. First, a "user-friendly" > guardrails > > > > which will be an opinionated guardrail meant to protect the cluster > > from > > > > inadvertent sub-optimal and incorrect configuration. Users who do not > > > wish > > > > to spend time in learning Kafka intricacies and the effect of myriad > > > > configurations can choose this option. Second, a "advanced user" > > > guardrail > > > > which will allow users to push the cluster to it's boundaries with > full > > > > knowledge of the repercussions and fallout that may ensue. > > > > > > > > As a project, which is trusted by users across this broad spectrum of > > > > expertise, I do believe that we need to provide user-friendly > > mechanisms > > > > for different segments for users, but that is something I am > unwilling > > to > > > > take up in this KIP. If you feel passionate about addressing this > > feature > > > > gap in Apache Kafka, please feel free to start a discussion on this > > topic > > > > and I will promise to be an active participant in it. The outcome > could > > > be > > > > as simple as a documentation where we provide ideal configuration for > > > > different workloads. > > > > > > > > *Luke* > > > > > > > > I agree with your suggestion to set the number of threads based on > > > machine > > > > cores. In fact, I would promote a similar solution for network > threads > > > and > > > > request handler threads as well (as an example, I have seen users > > setting > > > > large values for network threads compared to request handler threads, > > > > resulting in accepting more requests that brokers can handle, leading > > to > > > > degradation. A constraint that network threads <= request handler > > threads > > > > could prevent such scenario). > > > > > > > > Earlier, I didn't add these since these are not dead-obvious changes > > > > compared to the rest of the changes in the KIP, but I have now added > > > them. > > > > It's worth discussing them. > > > > > > > > -- > > > > Divij Vaidya > > > > > > > > > > > > > > > > On Wed, Nov 20, 2024 at 11:01 AM Luke Chen <show...@gmail.com> > wrote: > > > > > > > > > Hi Divij, > > > > > > > > > > One comment: > > > > > For the default value of "num.recovery.threads.per.data.dir", we > > change > > > > > from 1 to 2. > > > > > Could we change the value based on the CPU core number? > > > > > Ex: default value = CPU core num / num of log dirs. > > > > > > > > > > WDYT? > > > > > > > > > > Luke > > > > > > > > > > > > > > > On Wed, Nov 20, 2024 at 5:53 PM Tommi Vainikainen > > > > > <tvain...@aiven.io.invalid> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > This proposal looks good to me with new defaults and constraints. > > > > > > > > > > > > About breaking backward compatibility I've question on how you > see > > it > > > > > best > > > > > > handled in case someone upgrades Kafka cluster from 3.9 to 4.0 > with > > > > let's > > > > > > say one topic having segment.bytes=500 kb (new min is 1 MB). Does > > it > > > > mean > > > > > > that the Kafka process fails to start with the ConfigException, > or > > > when > > > > > is > > > > > > the ConfigException thrown? Would it be better to simply start > > > behaving > > > > > > like it is at least a new minimum regardless of topic setting, > and > > > then > > > > > on > > > > > > the fly fix the config instead? That would be much more > convenient > > > for > > > > > most > > > > > > Kafka users in my opinion. > > > > > > > > > > > > On Tue, Nov 19, 2024 at 12:55 PM Divij Vaidya < > > > divijvaidy...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > KT1 - That is right. We will throw a ConfigException. That is > why > > > > this > > > > > > > change is considered backward incompatible. To be honest, given > > the > > > > > > nature > > > > > > > of suggested changes, I don't see any valid use case on why a > > user > > > > may > > > > > > have > > > > > > > a value which will be invalid after the new constraints. > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Divij Vaidya > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 19, 2024 at 2:21 AM Kirk True <k...@kirktrue.pro> > > > wrote: > > > > > > > > > > > > > > > Hi Divij, > > > > > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > > > > > My only question: > > > > > > > > > > > > > > > > KT1. In the case where we change the constraints so that a > > user's > > > > > > > > previously valid configuration is now invalid, do we do > > anything > > > > > other > > > > > > > than > > > > > > > > throw a ConfigException? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Kirk > > > > > > > > > > > > > > > > On Mon, Nov 18, 2024, at 2:13 AM, Divij Vaidya wrote: > > > > > > > > > Hey folks > > > > > > > > > > > > > > > > > > With 4.0, we have an opportunity to reset the default > values > > > and > > > > > add > > > > > > > > > constraints in the configurations based on our learnings > > since > > > > 3.0. > > > > > > > > > > > > > > > > > > Here's a KIP which modifies defaults for some properties > and > > > > > modifies > > > > > > > the > > > > > > > > > constraints for a few others. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking forward for your feedback. > > > > > > > > > > > > > > > > > > (Previous discussion thread on this topic - > > > > > > > > > > > > https://lists.apache.org/thread/3dx9mdmsqf8pko9xdmhks80k96g650zp > > > > ) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >