Re: [DISCUSS] KIP-1219: Configurations for KRaft Fetch and FetchSnapshot Byte Size

Jonah Hooper via dev Tue, 06 Jan 2026 14:13:09 -0800

Thanks Alyssa, that's a good point.

I must have observed this in a cluster without Pre-Vote enabled. I'll
clarify the wording that this will not cause chaos if Prevote is present.


Best,
Jonah



On Tue, Jan 6, 2026 at 3:54 PM Alyssa Huang via dev <[email protected]>
wrote:

> Hey Jonah,
>
> Thanks for the KIP and PR! I agree with the motivation for the change and
> just wanted to clarify a bit of the wording:
>
> > The new controller, by becoming a candidate and increasing the epoch will
> start to trigger frequent leadership elections which it can never win
> causing further chaos in the cluster.
>
> It wasn't clear to me why we assume the new controller would become
> candidate if other members of the quorum would presumably not grant any
> pre-vote requests from this node (it is unable to complete fetch requests
> and therefore unable to catch up sufficiently on the log).
>
> Best,
> Alyssa
>
> On Wed, Oct 15, 2025 at 8:42 AM Jonah Hooper <[email protected]
> >
> wrote:
>
> > Hey Justine,
> >
> > Thanks for reading the KIP!
> >
> > > For  *internal.max.size.bytes, *you mentioned removing this internal
> > config. Can you clarify this config and how it is used? I couldn't find
> it
> > in the code.
> >
> > It is used here -
> >
> >
> https://github.com/apache/kafka/blob/616b0234c88c4f265c3545c282f38709e8e4f7f7/core/src/main/scala/kafka/raft/KafkaMetadataLog.scala#L76
> >
> > And defined here -
> >
> >
> https://github.com/apache/kafka/blob/616b0234c88c4f265c3545c282f38709e8e4f7f7/raft/src/main/java/org/apache/kafka/raft/MetadataLogConfig.java#L120
> >
> > Best,
> > Jonah
> >
> > On Tue, Oct 14, 2025 at 2:26 PM Justine Olshan
> > <[email protected]>
> > wrote:
> >
> > > Hi Jonah,
> > >
> > > Thanks for the KIP.
> > > For now I have one question.
> > >
> > > For  *internal.max.size.bytes, *you mentioned removing this internal
> > > config. Can you clarify this config and how it is used? I couldn't find
> > it
> > > in the code.
> > >
> > > Justine
> > >
> > >
> > >
> > > On Tue, Oct 7, 2025 at 11:07 AM Jun Rao <[email protected]>
> > wrote:
> > >
> > > > Hi, Jonah,
> > > >
> > > > Thanks for the KIP. Just a couple of minor comments.
> > > >
> > > > controller.quorum.fetch.timeout.ms defaults to 2 secs. Is that a bit
> > too
> > > > low?
> > > >
> > > > Orthogonally, in the case when the network latency is long, one can
> > > > typically tune socket.receive.buffer.bytes to improve the throughput.
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > > On Fri, Sep 26, 2025 at 10:29 AM José Armando García Sancio
> > > > <[email protected]> wrote:
> > > >
> > > > > Thanks for the changes Jonah. Looking forward to the implementation
> > of
> > > > > this feature.
> > > > >
> > > > > On Fri, Sep 26, 2025 at 12:33 PM Jonah Hooper
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Thanks for the feedback José! Did some work to apply the KIP to a
> > > > correct
> > > > > > format.
> > > > > >
> > > > > > The motivation section starts with the solution without
> motivating
> > > the
> > > > > > > problem. I think you want to delete the first paragraph in the
> > > > > > > motivation section.
> > > > > >
> > > > > >
> > > > > > Done.
> > > > > >
> > > > > > In the motivation section you have "non-terminating loop where
> they
> > > > > > > cannot join the cluster." They are technically not joining the
> > > > > > > cluster. The issue is a liveness issue. Because the
> > FETCH_SNAPSHOT
> > > > RPC
> > > > > > > doesn't complete within the fetch timeout configuration the
> > > fetching
> > > > > > > replica cannot make progress. In the worst case, if it is a
> > > > > > > voter/controller, it further disrupts the leader by becoming a
> > > > > > > candidate and increasing the epoch.
> > > > > >
> > > > > >
> > > > > > Updated the motivation to reflect this.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >  In the public interface section, please make it clear that the
> > > > changes
> > > > > > > are the addition of two new configurations. Let's also add
> > > > > descriptions for
> > > > > > > these configurations.
> > > > > >
> > > > > >
> > > > > > Added descriptions.
> > > > > >
> > > > > > Please fill out "Compatibility, Deprecation, and Migration Plan".
> > Why
> > > > > > > is this safe and backward compatible?
> > > > > > >
> > > > > >
> > > > > > Done.
> > > > > >
> > > > > > In the test section you have "We can perform once off tests which
> > > > > > > evaluate the effect of degraded networking on snapshot. It's
> > better
> > > > to
> > > > > > > design system-tests which focus on correctness by using
> > > > pathologically
> > > > > > > low values for controller.quorum.fetch.snapshot.max.bytes and
> > > > > > > controller.quorum.fetch.max.bytes. An example scenario would
> be a
> > > > > > > controller joining a quorum with a snapshot with a known size
> and
> > > > then
> > > > > > > attempting to fetch the snapshot in small (say 64kb) or even
> > > > > > > pathologically small values like 1 byte." I don't fully
> > understand
> > > > > > > this recommendation but if my understanding is correct this
> > sounds
> > > > > > > very complicated and flakey to implement. Can't we just add
> > > protocol
> > > > > > > tests to KafkaRaftClient*Test that show these configurations
> > being
> > > > > > > used when handling FETC and FETCH_SNAPSHOT requests?
> > > > > >
> > > > > >
> > > > > > Good point; that section is too specific. I've left it a bit more
> > > > vague,
> > > > > > unit tests are sufficient provided we can "mock" a situation
> where
> > we
> > > > > > retrieve less data than is available.
> > > > > >
> > > > > > Best,
> > > > > > Jonah
> > > > > >
> > > > > > On Mon, Sep 22, 2025 at 12:31 PM José Armando García Sancio
> > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > > Hi Jonah,
> > > > > > >
> > > > > > > Thanks for the changes. The proposed solution looks good to me
> > at a
> > > > > > > high-level. I just have some minor comments.
> > > > > > >
> > > > > > > The motivation section starts with the solution without
> > motivating
> > > > the
> > > > > > > problem. I think you want to delete the first paragraph in the
> > > > > > > motivation section.
> > > > > > >
> > > > > > > In the motivation section you have "non-terminating loop where
> > they
> > > > > > > cannot join the cluster." They are technically not joining the
> > > > > > > cluster. The issue is a liveness issue. Because the
> > FETCH_SNAPSHOT
> > > > RPC
> > > > > > > doesn't complete within the fetch timeout configuration the
> > > fetching
> > > > > > > replica cannot make progress. In the worst case, if it is a
> > > > > > > voter/controller, it further disrupts the leader by becoming a
> > > > > > > candidate and increasing the epoch.
> > > > > > >
> > > > > > > In the motivation section you have "The proposed default
> > > > > > > configurations tradeoff a reduction in the throughput of data
> > > > > > > transmitted by Fetch and FetchSnapshot requests with a liveness
> > and
> > > > > > > correctness improvement." This is also the solution and not
> part
> > of
> > > > > > > the motivation.
> > > > > > >
> > > > > > > In the public interface section, please make it clear that the
> > > > changes
> > > > > > > are the addition of two new configurations. Let's also add
> > > > > > > descriptions for these configurations.
> > > > > > >
> > > > > > > Please fill out "Compatibility, Deprecation, and Migration
> Plan".
> > > Why
> > > > > > > is this safe and backward compatible?
> > > > > > >
> > > > > > > In the test section you have "We can perform once off tests
> which
> > > > > > > evaluate the effect of degraded networking on snapshot. It's
> > better
> > > > to
> > > > > > > design system-tests which focus on correctness by using
> > > > pathologically
> > > > > > > low values for controller.quorum.fetch.snapshot.max.bytes and
> > > > > > > controller.quorum.fetch.max.bytes. An example scenario would
> be a
> > > > > > > controller joining a quorum with a snapshot with a known size
> and
> > > > then
> > > > > > > attempting to fetch the snapshot in small (say 64kb) or even
> > > > > > > pathologically small values like 1 byte." I don't fully
> > understand
> > > > > > > this recommendation but if my understanding is correct this
> > sounds
> > > > > > > very complicated and flakey to implement. Can't we just add
> > > protocol
> > > > > > > tests to KafkaRaftClient*Test that show these configurations
> > being
> > > > > > > used when handling FETC and FETCH_SNAPSHOT requests?
> > > > > > >
> > > > > > > I didn't comment on the proposed changes section. I'll comment
> > > after
> > > > > > > the comments above are addressed.
> > > > > > >
> > > > > > > Thanks
> > > > > > > --
> > > > > > > -José
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -José
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-1219: Configurations for KRaft Fetch and FetchSnapshot Byte Size

Reply via email to