Thanks Alyssa, that's a good point. I must have observed this in a cluster without Pre-Vote enabled. I'll clarify the wording that this will not cause chaos if Prevote is present.
Best, Jonah On Tue, Jan 6, 2026 at 3:54 PM Alyssa Huang via dev <[email protected]> wrote: > Hey Jonah, > > Thanks for the KIP and PR! I agree with the motivation for the change and > just wanted to clarify a bit of the wording: > > > The new controller, by becoming a candidate and increasing the epoch will > start to trigger frequent leadership elections which it can never win > causing further chaos in the cluster. > > It wasn't clear to me why we assume the new controller would become > candidate if other members of the quorum would presumably not grant any > pre-vote requests from this node (it is unable to complete fetch requests > and therefore unable to catch up sufficiently on the log). > > Best, > Alyssa > > On Wed, Oct 15, 2025 at 8:42 AM Jonah Hooper <[email protected] > > > wrote: > > > Hey Justine, > > > > Thanks for reading the KIP! > > > > > For *internal.max.size.bytes, *you mentioned removing this internal > > config. Can you clarify this config and how it is used? I couldn't find > it > > in the code. > > > > It is used here - > > > > > https://github.com/apache/kafka/blob/616b0234c88c4f265c3545c282f38709e8e4f7f7/core/src/main/scala/kafka/raft/KafkaMetadataLog.scala#L76 > > > > And defined here - > > > > > https://github.com/apache/kafka/blob/616b0234c88c4f265c3545c282f38709e8e4f7f7/raft/src/main/java/org/apache/kafka/raft/MetadataLogConfig.java#L120 > > > > Best, > > Jonah > > > > On Tue, Oct 14, 2025 at 2:26 PM Justine Olshan > > <[email protected]> > > wrote: > > > > > Hi Jonah, > > > > > > Thanks for the KIP. > > > For now I have one question. > > > > > > For *internal.max.size.bytes, *you mentioned removing this internal > > > config. Can you clarify this config and how it is used? I couldn't find > > it > > > in the code. > > > > > > Justine > > > > > > > > > > > > On Tue, Oct 7, 2025 at 11:07 AM Jun Rao <[email protected]> > > wrote: > > > > > > > Hi, Jonah, > > > > > > > > Thanks for the KIP. Just a couple of minor comments. > > > > > > > > controller.quorum.fetch.timeout.ms defaults to 2 secs. Is that a bit > > too > > > > low? > > > > > > > > Orthogonally, in the case when the network latency is long, one can > > > > typically tune socket.receive.buffer.bytes to improve the throughput. > > > > > > > > Jun > > > > > > > > > > > > > > > > On Fri, Sep 26, 2025 at 10:29 AM José Armando García Sancio > > > > <[email protected]> wrote: > > > > > > > > > Thanks for the changes Jonah. Looking forward to the implementation > > of > > > > > this feature. > > > > > > > > > > On Fri, Sep 26, 2025 at 12:33 PM Jonah Hooper > > > > > <[email protected]> wrote: > > > > > > > > > > > > Thanks for the feedback José! Did some work to apply the KIP to a > > > > correct > > > > > > format. > > > > > > > > > > > > The motivation section starts with the solution without > motivating > > > the > > > > > > > problem. I think you want to delete the first paragraph in the > > > > > > > motivation section. > > > > > > > > > > > > > > > > > > Done. > > > > > > > > > > > > In the motivation section you have "non-terminating loop where > they > > > > > > > cannot join the cluster." They are technically not joining the > > > > > > > cluster. The issue is a liveness issue. Because the > > FETCH_SNAPSHOT > > > > RPC > > > > > > > doesn't complete within the fetch timeout configuration the > > > fetching > > > > > > > replica cannot make progress. In the worst case, if it is a > > > > > > > voter/controller, it further disrupts the leader by becoming a > > > > > > > candidate and increasing the epoch. > > > > > > > > > > > > > > > > > > Updated the motivation to reflect this. > > > > > > > > > > > > > > > > > > > > > > > > > In the public interface section, please make it clear that the > > > > changes > > > > > > > are the addition of two new configurations. Let's also add > > > > > descriptions for > > > > > > > these configurations. > > > > > > > > > > > > > > > > > > Added descriptions. > > > > > > > > > > > > Please fill out "Compatibility, Deprecation, and Migration Plan". > > Why > > > > > > > is this safe and backward compatible? > > > > > > > > > > > > > > > > > > > Done. > > > > > > > > > > > > In the test section you have "We can perform once off tests which > > > > > > > evaluate the effect of degraded networking on snapshot. It's > > better > > > > to > > > > > > > design system-tests which focus on correctness by using > > > > pathologically > > > > > > > low values for controller.quorum.fetch.snapshot.max.bytes and > > > > > > > controller.quorum.fetch.max.bytes. An example scenario would > be a > > > > > > > controller joining a quorum with a snapshot with a known size > and > > > > then > > > > > > > attempting to fetch the snapshot in small (say 64kb) or even > > > > > > > pathologically small values like 1 byte." I don't fully > > understand > > > > > > > this recommendation but if my understanding is correct this > > sounds > > > > > > > very complicated and flakey to implement. Can't we just add > > > protocol > > > > > > > tests to KafkaRaftClient*Test that show these configurations > > being > > > > > > > used when handling FETC and FETCH_SNAPSHOT requests? > > > > > > > > > > > > > > > > > > Good point; that section is too specific. I've left it a bit more > > > > vague, > > > > > > unit tests are sufficient provided we can "mock" a situation > where > > we > > > > > > retrieve less data than is available. > > > > > > > > > > > > Best, > > > > > > Jonah > > > > > > > > > > > > On Mon, Sep 22, 2025 at 12:31 PM José Armando García Sancio > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > Hi Jonah, > > > > > > > > > > > > > > Thanks for the changes. The proposed solution looks good to me > > at a > > > > > > > high-level. I just have some minor comments. > > > > > > > > > > > > > > The motivation section starts with the solution without > > motivating > > > > the > > > > > > > problem. I think you want to delete the first paragraph in the > > > > > > > motivation section. > > > > > > > > > > > > > > In the motivation section you have "non-terminating loop where > > they > > > > > > > cannot join the cluster." They are technically not joining the > > > > > > > cluster. The issue is a liveness issue. Because the > > FETCH_SNAPSHOT > > > > RPC > > > > > > > doesn't complete within the fetch timeout configuration the > > > fetching > > > > > > > replica cannot make progress. In the worst case, if it is a > > > > > > > voter/controller, it further disrupts the leader by becoming a > > > > > > > candidate and increasing the epoch. > > > > > > > > > > > > > > In the motivation section you have "The proposed default > > > > > > > configurations tradeoff a reduction in the throughput of data > > > > > > > transmitted by Fetch and FetchSnapshot requests with a liveness > > and > > > > > > > correctness improvement." This is also the solution and not > part > > of > > > > > > > the motivation. > > > > > > > > > > > > > > In the public interface section, please make it clear that the > > > > changes > > > > > > > are the addition of two new configurations. Let's also add > > > > > > > descriptions for these configurations. > > > > > > > > > > > > > > Please fill out "Compatibility, Deprecation, and Migration > Plan". > > > Why > > > > > > > is this safe and backward compatible? > > > > > > > > > > > > > > In the test section you have "We can perform once off tests > which > > > > > > > evaluate the effect of degraded networking on snapshot. It's > > better > > > > to > > > > > > > design system-tests which focus on correctness by using > > > > pathologically > > > > > > > low values for controller.quorum.fetch.snapshot.max.bytes and > > > > > > > controller.quorum.fetch.max.bytes. An example scenario would > be a > > > > > > > controller joining a quorum with a snapshot with a known size > and > > > > then > > > > > > > attempting to fetch the snapshot in small (say 64kb) or even > > > > > > > pathologically small values like 1 byte." I don't fully > > understand > > > > > > > this recommendation but if my understanding is correct this > > sounds > > > > > > > very complicated and flakey to implement. Can't we just add > > > protocol > > > > > > > tests to KafkaRaftClient*Test that show these configurations > > being > > > > > > > used when handling FETC and FETCH_SNAPSHOT requests? > > > > > > > > > > > > > > I didn't comment on the proposed changes section. I'll comment > > > after > > > > > > > the comments above are addressed. > > > > > > > > > > > > > > Thanks > > > > > > > -- > > > > > > > -José > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -José > > > > > > > > > > > > > > >
