Hi, Divij, Thanks for the updated KIP.
Regarding num.recovery.threads.per.data.dir/num.network.threads/num.io.threads, I have a few concerns. 1. It seems that we can't use floor since this can result in a value of 0, which is invalid. 2. I am not sure that the new default value is actually better. For example, if the recovery is I/O bound, we actually want to have more threads than the number of cores. 3. It's now hard for a user to know the actual default value. 4. It may also be a surprise that the default value changes when a user adds a listener or a dir. Thanks, Jun On Wed, Nov 20, 2024 at 7:36 AM Kamal Chandraprakash < kamal.chandraprak...@gmail.com> wrote: > Hi Divij, > > We would like to change the default value for the below remote log > configurations: > > 1. remote.log.manager.copier.thread.pool.size -- default value from -1 to > 10 > 2. remote.log.manager.expiration.thread.pool.size -- default value from -1 > to 10 > > In KIP-950 > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-ZookeeperBackedCluster > >, > it was proposed to split the thread pool that was used by leader and > follower into copy and expiration thread pools. With KIP-956 > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas > >, > we did this change in v3.9. > The default value for the copier and expiration thread pools was set to -1 > to derive the > value from the previous thread pool for backward compatibility. See > PR#17499 > <https://github.com/apache/kafka/pull/17499/#discussion_r1804119567>. > > Now, we have three thread pools: > > 1. remote.log.manager.thread.pool.size -- this is used by replica to fetch > the highest-uploaded remote offset for the follower partitions and won't be > deprecated as proposed. > 2. remote.log.manager.copier.thread.pool.size -- this is used by leader to > upload the log segments to remote. > 3. remote.log.manager.expiration.thread.pool.size -- this is used by leader > to delete the expired log segments from remote. > > A broker acts as both the leader for some partitions and as follower for > other partitions, > deriving the value from `remote.log.manager.thread.pool.size` for copier > and expiration > thread pool looks incorrect to me. The task of the follower is light-weight > to fetch only > the highest-uploaded remote offset whereas the task of the leader is heavy > to copy and > delete the segments. Moreover, the change didn't ensure backward > compatibility, if the > user was running with 10 threads in v3.8. In v3.9, the server will start 30 > threads. > > So, I propose to change the default value of copier and expiration thread > pools from -1 to 10. > This will also ensure smooth validation while making those configs as > dynamic. > > Thanks, > Kamal > > On Wed, Nov 20, 2024 at 5:41 PM Divij Vaidya <divijvaidy...@gmail.com> > wrote: > > > Thank you everyone for your feedback. > > > > The new revision of KIP is ready for review. Notable changes since last > > revision: > > 1. Removed change to log.message.timestamp.before.max.ms > > 2. Modified default for num.recovery.threads.per.data.dir > > 3. Added new proposals for num.io.threads and num.network.threads > > 4. Improved justification for changes and made motivation more explicit > > > > > > *Federico* > > > > Good point. I have updated the compatibility section to include > information > > on how the new values will be handled during the rolling upgrade to 4.0. > > > > *Jun* > > > > Thank you for pointing out the use case. It correlated with my experience > > where most of the problems arise from users accidentally setting a > > timestamp in ns instead of ms, and hence, leading to a future time. I > have > > reverted changes to log.message.timestamp.after.max.ms > > > > *Tommi* > > > > You have a valid point about the friction this could cause during > upgrade. > > This is a risk we are accepting with this KIP. The risk is acceptable > > because we expect due diligence from the users while transitioning to a > new > > major version and also because, there aren't legitimate use cases to set > > these values to a threshold where they will cause system instability. > > > > I have added the failure mode for invalid values in the compatibility > > section. > > > > About your suggestion to apply new changes but not fail upgrade, I am a > bit > > wary about silently modifying configuration that user has explicitly set. > > The strategy also won't work for cases where the user uses a 3P tool to > > reset configurations. Failing fast on invalid configuration is the best > > option. To improve user experience I have added a suggestion in the KIP > to > > add a "upgrade validation tool" which a user can run to verify > > compatibility before actual upgrade. It won't be part of this KIP but > > something we might want to do to ease migration to 4.0. I will float the > > idea around in the community and see if we have volunteers for this. > > > > *Greg* > > > > Thanks for raising this concern. We indeed intentionally use small values > > to "force" a segment roll sometimes. However, I believe that there are > > alternate ways to accomplish that in a test environment. As an example, > if > > I wanted to create small segments for a test scenario, I would either > write > > an integration test where I can call roll explicitly OR send a message > with > > create time in future (roll is based on timestamp of last appended > message) > > OR just write more data (1MB minimum today). > > > > But to your larger point, I agree that there is value in providing two > > different guardrails for the users. First, a "user-friendly" guardrails > > which will be an opinionated guardrail meant to protect the cluster from > > inadvertent sub-optimal and incorrect configuration. Users who do not > wish > > to spend time in learning Kafka intricacies and the effect of myriad > > configurations can choose this option. Second, a "advanced user" > guardrail > > which will allow users to push the cluster to it's boundaries with full > > knowledge of the repercussions and fallout that may ensue. > > > > As a project, which is trusted by users across this broad spectrum of > > expertise, I do believe that we need to provide user-friendly mechanisms > > for different segments for users, but that is something I am unwilling to > > take up in this KIP. If you feel passionate about addressing this feature > > gap in Apache Kafka, please feel free to start a discussion on this topic > > and I will promise to be an active participant in it. The outcome could > be > > as simple as a documentation where we provide ideal configuration for > > different workloads. > > > > *Luke* > > > > I agree with your suggestion to set the number of threads based on > machine > > cores. In fact, I would promote a similar solution for network threads > and > > request handler threads as well (as an example, I have seen users setting > > large values for network threads compared to request handler threads, > > resulting in accepting more requests that brokers can handle, leading to > > degradation. A constraint that network threads <= request handler threads > > could prevent such scenario). > > > > Earlier, I didn't add these since these are not dead-obvious changes > > compared to the rest of the changes in the KIP, but I have now added > them. > > It's worth discussing them. > > > > -- > > Divij Vaidya > > > > > > > > On Wed, Nov 20, 2024 at 11:01 AM Luke Chen <show...@gmail.com> wrote: > > > > > Hi Divij, > > > > > > One comment: > > > For the default value of "num.recovery.threads.per.data.dir", we change > > > from 1 to 2. > > > Could we change the value based on the CPU core number? > > > Ex: default value = CPU core num / num of log dirs. > > > > > > WDYT? > > > > > > Luke > > > > > > > > > On Wed, Nov 20, 2024 at 5:53 PM Tommi Vainikainen > > > <tvain...@aiven.io.invalid> > > > wrote: > > > > > > > Hi, > > > > > > > > This proposal looks good to me with new defaults and constraints. > > > > > > > > About breaking backward compatibility I've question on how you see it > > > best > > > > handled in case someone upgrades Kafka cluster from 3.9 to 4.0 with > > let's > > > > say one topic having segment.bytes=500 kb (new min is 1 MB). Does it > > mean > > > > that the Kafka process fails to start with the ConfigException, or > when > > > is > > > > the ConfigException thrown? Would it be better to simply start > behaving > > > > like it is at least a new minimum regardless of topic setting, and > then > > > on > > > > the fly fix the config instead? That would be much more convenient > for > > > most > > > > Kafka users in my opinion. > > > > > > > > On Tue, Nov 19, 2024 at 12:55 PM Divij Vaidya < > divijvaidy...@gmail.com > > > > > > > wrote: > > > > > > > > > KT1 - That is right. We will throw a ConfigException. That is why > > this > > > > > change is considered backward incompatible. To be honest, given the > > > > nature > > > > > of suggested changes, I don't see any valid use case on why a user > > may > > > > have > > > > > a value which will be invalid after the new constraints. > > > > > > > > > > > > > > > -- > > > > > Divij Vaidya > > > > > > > > > > > > > > > > > > > > On Tue, Nov 19, 2024 at 2:21 AM Kirk True <k...@kirktrue.pro> > wrote: > > > > > > > > > > > Hi Divij, > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > My only question: > > > > > > > > > > > > KT1. In the case where we change the constraints so that a user's > > > > > > previously valid configuration is now invalid, do we do anything > > > other > > > > > than > > > > > > throw a ConfigException? > > > > > > > > > > > > Thanks, > > > > > > Kirk > > > > > > > > > > > > On Mon, Nov 18, 2024, at 2:13 AM, Divij Vaidya wrote: > > > > > > > Hey folks > > > > > > > > > > > > > > With 4.0, we have an opportunity to reset the default values > and > > > add > > > > > > > constraints in the configurations based on our learnings since > > 3.0. > > > > > > > > > > > > > > Here's a KIP which modifies defaults for some properties and > > > modifies > > > > > the > > > > > > > constraints for a few others. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations > > > > > > > > > > > > > > > > > > > > > Looking forward for your feedback. > > > > > > > > > > > > > > (Previous discussion thread on this topic - > > > > > > > > https://lists.apache.org/thread/3dx9mdmsqf8pko9xdmhks80k96g650zp > > ) > > > > > > > > > > > > > > > > > > > > > > > > > > > >