One use case I see for setting the `segment.bytes` to 1 is to delete all the records from the topic. We can mention about it in the doc to use the `kafka-delete-records` API instead.
On Wed, Mar 13, 2024 at 6:59 PM Divij Vaidya <divijvaidy...@gmail.com> wrote: > + users@kafka > > Hi users of Apache Kafka > > With the upcoming 4.0 release, we have an opportunity to improve the > constraints and default values for various Kafka configurations. > > We are soliciting your feedback and suggestions on configurations where the > default values and/or constraints should be adjusted. Please reply in this > thread directly. > > -- > Divij Vaidya > Apache Kafka PMC > > > > On Wed, Mar 13, 2024 at 12:56 PM Divij Vaidya <divijvaidy...@gmail.com> > wrote: > > > Thanks for the discussion folks. I have started a KIP > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations > > to keep track of the changes that we are discussion. Please consider this > > as a collaborative work-in-progress KIP and once it is ready to be > > published, we can start a discussion thread on it. > > > > I am also going to start a thread to solicit feedback from users@ > mailing > > list as well. > > > > -- > > Divij Vaidya > > > > > > > > On Wed, Mar 13, 2024 at 12:55 PM Christopher Shannon < > > christopher.l.shan...@gmail.com> wrote: > > > >> I think it's a great idea to raise a KIP to look at adjusting defaults > and > >> minimum/maximum config values for version 4.0. > >> > >> As pointed out, the minimum values for segment.ms and segment.bytes > don't > >> make sense and would probably bring down a cluster pretty quickly if set > >> that low, so version 4.0 is a good time to fix it and to also look at > the > >> other configs as well for adjustments. > >> > >> On Wed, Mar 13, 2024 at 4:39 AM Sergio Daniel Troiano > >> <sergio.troi...@adevinta.com.invalid> wrote: > >> > >> > hey guys, > >> > > >> > Regarding to num.recovery.threads.per.data.dir: I agree, in our > company > >> we > >> > use the number of vCPUs to do so as this is not competing with ready > >> > cluster traffic. > >> > > >> > > >> > On Wed, 13 Mar 2024 at 09:29, Luke Chen <show...@gmail.com> wrote: > >> > > >> > > Hi Divij, > >> > > > >> > > Thanks for raising this. > >> > > The valid minimum value 1 for `segment.ms` is completely > >> unreasonable. > >> > > Similarly for `segment.bytes`, `metadata.log.segment.ms`, > >> > > `metadata.log.segment.bytes`. > >> > > > >> > > In addition to that, there are also some config default values we'd > >> like > >> > to > >> > > propose to change in v4.0. > >> > > We can collect more comments from the community, and come out with a > >> KIP > >> > > for them. > >> > > > >> > > 1. num.recovery.threads.per.data.dir: > >> > > The current default value is 1. But the log recovery is happening > >> before > >> > > brokers are in ready state, which means, we should use all the > >> available > >> > > resource to speed up the log recovery to bring the broker to ready > >> state > >> > > soon. Default value should be... maybe 4 (to be decided)? > >> > > > >> > > 2. Other configs might be able to consider to change the default, > but > >> > open > >> > > for comments: > >> > > 2.1. num.replica.fetchers: default is 1, but that's not enough > when > >> > > there are multiple partitions in the cluster > >> > > 2.2. `socket.send.buffer.bytes`/`socket.receive.buffer.bytes`: > >> > > Currently, we set 100kb as default value, but that's not enough for > >> > > high-speed network. > >> > > > >> > > Thank you. > >> > > Luke > >> > > > >> > > > >> > > On Tue, Mar 12, 2024 at 1:32 AM Divij Vaidya < > divijvaidy...@gmail.com > >> > > >> > > wrote: > >> > > > >> > > > Hey folks > >> > > > > >> > > > Before I file a KIP to change this in 4.0, I wanted to understand > >> the > >> > > > historical context for the value of the following setting. > >> > > > > >> > > > Currently, segment.ms minimum threshold is set to 1ms [1]. > >> > > > > >> > > > Segments are expensive. Every segment uses multiple file > descriptors > >> > and > >> > > > it's easy to run out of OS limits when creating a large number of > >> > > segments. > >> > > > Large number of segments also delays log loading on startup > because > >> of > >> > > > expensive operations such as iterating through all directories & > >> > > > conditionally loading all producer state. > >> > > > > >> > > > I am currently not aware of a reason as to why someone might want > to > >> > work > >> > > > with a segment.ms of less than ~10s (number chosen arbitrary that > >> > looks > >> > > > sane) > >> > > > > >> > > > What was the historical context of setting the minimum threshold > to > >> 1ms > >> > > for > >> > > > this setting? > >> > > > > >> > > > [1] > >> > https://kafka.apache.org/documentation.html#topicconfigs_segment.ms > >> > > > > >> > > > -- > >> > > > Divij Vaidya > >> > > > > >> > > > >> > > >> > > >