Thank you everyone for your feedback. The new revision of KIP is ready for review. Notable changes since last revision: 1. Removed change to log.message.timestamp.before.max.ms 2. Modified default for num.recovery.threads.per.data.dir 3. Added new proposals for num.io.threads and num.network.threads 4. Improved justification for changes and made motivation more explicit
*Federico* Good point. I have updated the compatibility section to include information on how the new values will be handled during the rolling upgrade to 4.0. *Jun* Thank you for pointing out the use case. It correlated with my experience where most of the problems arise from users accidentally setting a timestamp in ns instead of ms, and hence, leading to a future time. I have reverted changes to log.message.timestamp.after.max.ms *Tommi* You have a valid point about the friction this could cause during upgrade. This is a risk we are accepting with this KIP. The risk is acceptable because we expect due diligence from the users while transitioning to a new major version and also because, there aren't legitimate use cases to set these values to a threshold where they will cause system instability. I have added the failure mode for invalid values in the compatibility section. About your suggestion to apply new changes but not fail upgrade, I am a bit wary about silently modifying configuration that user has explicitly set. The strategy also won't work for cases where the user uses a 3P tool to reset configurations. Failing fast on invalid configuration is the best option. To improve user experience I have added a suggestion in the KIP to add a "upgrade validation tool" which a user can run to verify compatibility before actual upgrade. It won't be part of this KIP but something we might want to do to ease migration to 4.0. I will float the idea around in the community and see if we have volunteers for this. *Greg* Thanks for raising this concern. We indeed intentionally use small values to "force" a segment roll sometimes. However, I believe that there are alternate ways to accomplish that in a test environment. As an example, if I wanted to create small segments for a test scenario, I would either write an integration test where I can call roll explicitly OR send a message with create time in future (roll is based on timestamp of last appended message) OR just write more data (1MB minimum today). But to your larger point, I agree that there is value in providing two different guardrails for the users. First, a "user-friendly" guardrails which will be an opinionated guardrail meant to protect the cluster from inadvertent sub-optimal and incorrect configuration. Users who do not wish to spend time in learning Kafka intricacies and the effect of myriad configurations can choose this option. Second, a "advanced user" guardrail which will allow users to push the cluster to it's boundaries with full knowledge of the repercussions and fallout that may ensue. As a project, which is trusted by users across this broad spectrum of expertise, I do believe that we need to provide user-friendly mechanisms for different segments for users, but that is something I am unwilling to take up in this KIP. If you feel passionate about addressing this feature gap in Apache Kafka, please feel free to start a discussion on this topic and I will promise to be an active participant in it. The outcome could be as simple as a documentation where we provide ideal configuration for different workloads. *Luke* I agree with your suggestion to set the number of threads based on machine cores. In fact, I would promote a similar solution for network threads and request handler threads as well (as an example, I have seen users setting large values for network threads compared to request handler threads, resulting in accepting more requests that brokers can handle, leading to degradation. A constraint that network threads <= request handler threads could prevent such scenario). Earlier, I didn't add these since these are not dead-obvious changes compared to the rest of the changes in the KIP, but I have now added them. It's worth discussing them. -- Divij Vaidya On Wed, Nov 20, 2024 at 11:01 AM Luke Chen <show...@gmail.com> wrote: > Hi Divij, > > One comment: > For the default value of "num.recovery.threads.per.data.dir", we change > from 1 to 2. > Could we change the value based on the CPU core number? > Ex: default value = CPU core num / num of log dirs. > > WDYT? > > Luke > > > On Wed, Nov 20, 2024 at 5:53 PM Tommi Vainikainen > <tvain...@aiven.io.invalid> > wrote: > > > Hi, > > > > This proposal looks good to me with new defaults and constraints. > > > > About breaking backward compatibility I've question on how you see it > best > > handled in case someone upgrades Kafka cluster from 3.9 to 4.0 with let's > > say one topic having segment.bytes=500 kb (new min is 1 MB). Does it mean > > that the Kafka process fails to start with the ConfigException, or when > is > > the ConfigException thrown? Would it be better to simply start behaving > > like it is at least a new minimum regardless of topic setting, and then > on > > the fly fix the config instead? That would be much more convenient for > most > > Kafka users in my opinion. > > > > On Tue, Nov 19, 2024 at 12:55 PM Divij Vaidya <divijvaidy...@gmail.com> > > wrote: > > > > > KT1 - That is right. We will throw a ConfigException. That is why this > > > change is considered backward incompatible. To be honest, given the > > nature > > > of suggested changes, I don't see any valid use case on why a user may > > have > > > a value which will be invalid after the new constraints. > > > > > > > > > -- > > > Divij Vaidya > > > > > > > > > > > > On Tue, Nov 19, 2024 at 2:21 AM Kirk True <k...@kirktrue.pro> wrote: > > > > > > > Hi Divij, > > > > > > > > Thanks for the KIP! > > > > > > > > My only question: > > > > > > > > KT1. In the case where we change the constraints so that a user's > > > > previously valid configuration is now invalid, do we do anything > other > > > than > > > > throw a ConfigException? > > > > > > > > Thanks, > > > > Kirk > > > > > > > > On Mon, Nov 18, 2024, at 2:13 AM, Divij Vaidya wrote: > > > > > Hey folks > > > > > > > > > > With 4.0, we have an opportunity to reset the default values and > add > > > > > constraints in the configurations based on our learnings since 3.0. > > > > > > > > > > Here's a KIP which modifies defaults for some properties and > modifies > > > the > > > > > constraints for a few others. > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations > > > > > > > > > > > > > > > Looking forward for your feedback. > > > > > > > > > > (Previous discussion thread on this topic - > > > > > https://lists.apache.org/thread/3dx9mdmsqf8pko9xdmhks80k96g650zp ) > > > > > > > > > > > > > > >