Hi Jun,

Thanks for the review! I've made the addendum to this KIP about the remote
storage configs:

> 13. Why are the defaults for remote.log.manager.copier.thread.pool.size
and
remote.log.manager.thread.pool.size different?

We have 3 thread pools in remote log manager:

a. remote.log.manager.thread.pool.size: handles the RLMFollowerTask
<https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1454>
to read the highest-uploaded remote offset for follower partitions
b. remote.log.manager.copier.thread.pool.size: handles the RLMCopyTask
<https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L830>
to copy the segments to remote storage
c. remote.log.manager.expiration.thread.pool.size: handles the
RLMExpirationTask
<https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1095>
to delete the expired remote segments.

The plan was to deprecate the remote.log.manager.thread.pool.size
thread-pool after KIP-950 but it was not done
and used for the follower partition tasks. Compared to copier/expiration
tasks, the follower tasks are light-weight
so proposing to reduce that thread-pool size to 2.

--
Kamal

On Wed, Nov 27, 2024 at 11:48 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Divij,
>
> I also share the same concern as Greg pointed out for segment.bytes /
> segment.index.bytes.
> All the tiered storage tests
> <https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/test/java/org/apache/kafka/tiered/storage/utils/TieredStorageTestUtils.java?L174>
> might start to fail with the new defaults. I'm fine with the proposal
> provided we plan to fix those tests as part of this KIP.
>
> --
> Kamal
>
>
> On Wed, Nov 27, 2024 at 9:55 PM Divij Vaidya <divijvaidy...@gmail.com>
> wrote:
>
>> Hello everyone
>>
>> Since, I believe I have addressed all the concerns that were raised here,
>> I
>> have started a vote thread for this KIP at
>> https://lists.apache.org/thread/39dmfkd7ktb0oo44yqrkndcn4kcqt5hc
>>
>> Please participate in the vote.
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Wed, Nov 27, 2024 at 12:33 PM Divij Vaidya <divijvaidy...@gmail.com>
>> wrote:
>>
>> > Thanks for the feedback folks.
>> >
>> > *Jun*
>> >
>> > 10. Fixed it. It now says 2 as the new default for recovery threads.
>> >
>> > 11. I have added a sentence that we will apply the defaults for both
>> > broker level and equivalent topic level configs. I have further added
>> both
>> > the broker level and the topic level config to the table. For example,
>> you
>> > may notice (message.timestamp.after.max.ms /
>> > log.message.timestamp.after.max.ms). Furthermore, the constraints will
>> > apply (similar to constraints today) when validating dynamically changed
>> > configuration and also when validating static configuration (such as
>> > server.properties). Please let me know if I have missed anything.
>> >
>> > 12. In the interest of time, I have removed the constraint proposal for
>> rf
>> > >= min.insync.replicas. We will circle back on it in a separate KIP.
>> >
>> > *Luke*
>> >
>> > 10 and 12 above should align with what you suggested.
>> >
>> > --
>> > Divij Vaidya
>> >
>> >
>> >
>> > On Wed, Nov 27, 2024 at 4:00 AM Luke Chen <show...@gmail.com> wrote:
>> >
>> >> Hi Divij and Jun,
>> >>
>> >> Thanks for your comments.
>> >> I'm good we put the default value of num.recovery.threads.per.data.dir
>> to
>> >> 2
>> >> since there are many factors that need to be considered.
>> >>
>> >> And James, good point of min.insync.replicas validation. If it's
>> >> complicated or will confuse users, I'd propose we leave it out of
>> v4.0.0.
>> >>
>> >> Thanks.
>> >> Luke
>> >>
>> >> On Tue, Nov 26, 2024 at 2:31 AM Jun Rao <j...@confluent.io.invalid>
>> wrote:
>> >>
>> >> > Hi, Divij,
>> >> >
>> >> > Thanks for the reply. A few more comments.
>> >> >
>> >> > 10. num.recovery.threads.per.data.dir still seems to depend on the
>> >> number
>> >> > of cores.
>> >> >
>> >> > 11. Some of the configs on the server side exist at different levels
>> >> > (static, broker, topic, etc) with slightly different names. It would
>> be
>> >> > useful to be clear at what level the new default and the constraint
>> >> apply.
>> >> >
>> >> > 12. James had a good point on min.insync.replicas. It would be
>> useful to
>> >> > define when the constraint applies (topic creation, config changes,
>> >> etc).
>> >> > For example, if the broker-level min.insync.replicas value is changed
>> >> to 2,
>> >> > what happens to existing topics with replication factor 1? If a topic
>> >> has
>> >> > min.insync.replicas of 2, what happens to an
>> AlterPartitionReassignments
>> >> > request that wants to reduce the replication factor to 1?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> > On Sat, Nov 23, 2024 at 9:57 AM Divij Vaidya <
>> divijvaidy...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Jun
>> >> > >
>> >> > > Thank you for the feedback. I have removed the configuration
>> changes
>> >> > where
>> >> > > we are relying on num cores. The only change I have kept is
>> increasing
>> >> > > recovery threads to 2 (from 1 as default).
>> >> > >
>> >> > > James
>> >> > >
>> >> > > Thank you for bringing the JIRA to my attention. I haven't looked
>> >> deeply
>> >> > > into the implementation but based on my understanding of the Kafka
>> >> code
>> >> > > base, I do believe that there is a path to implement this
>> constraint.
>> >> We
>> >> > > will cross that bridge during the implementation phase and I will
>> >> ensure
>> >> > > that I look at the historical context you provided in the JIRA.
>> >> > >
>> >> > > --
>> >> > > Divij Vaidya
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Sat, Nov 23, 2024 at 6:48 AM James Cheng <wushuja...@gmail.com>
>> >> > wrote:
>> >> > >
>> >> > > > About replication.factor >= min.insync.replicas change, you
>> should
>> >> look
>> >> > > at
>> >> > > > https://issues.apache.org/jira/browse/KAFKA-4680 . That JIRA
>> talks
>> >> > about
>> >> > > > some of the complexities of detecting it. For example, what if a
>> >> topic
>> >> > > has
>> >> > > > replication factor 1, but someone changes the broker-level
>> >> > > > min.insync.replicas value to 2? How would that be detected?
>> >> > > >
>> >> > > > That JIRA has an associated PR. The PR has some comments that
>> link
>> >> to
>> >> > > > discussions on this mailing list.
>> >> > > >
>> >> > > > That PR, btw, was just closed due to being stale.
>> >> > > >
>> >> > > > -James
>> >> > > >
>> >> > > > Sent from my iPhone
>> >> > > >
>> >> > > > > On Nov 18, 2024, at 2:15 AM, Divij Vaidya <
>> >> divijvaidy...@gmail.com>
>> >> > > > wrote:
>> >> > > > >
>> >> > > > > Hey folks
>> >> > > > >
>> >> > > > > With 4.0, we have an opportunity to reset the default values
>> and
>> >> add
>> >> > > > > constraints in the configurations based on our learnings since
>> >> 3.0.
>> >> > > > >
>> >> > > > > Here's a KIP which modifies defaults for some properties and
>> >> modifies
>> >> > > the
>> >> > > > > constraints for a few others.
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations
>> >> > > > >
>> >> > > > >
>> >> > > > > Looking forward for your feedback.
>> >> > > > >
>> >> > > > > (Previous discussion thread on this topic -
>> >> > > > >
>> https://lists.apache.org/thread/3dx9mdmsqf8pko9xdmhks80k96g650zp
>> >> )
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>

Reply via email to