Comments below. Here are the change to the KIP: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158864763&selectedPageVersions=34&selectedPageVersions=33
> 41. That's a good point. With compacted topic, the cleaning won't be done > until the active segment rolls. With snapshots, I guess we don't have this > restriction? So, it would be useful to guard against too frequent > snapshotting. Does the new proposal address this completely? If the > snapshot has only 1 record, and the new record keeps updating the same key, > does that still cause the snapshot to be generated frequently? That is true. In addition to metadata.snapshot.min.cleanable.ratio we can add the following configuration: metadata.snapshot.min.records.size - This is the minimum number of bytes in the replicated log between the latest snapshot and the high-watermark needed before generating a new snapshot. The default is 20MB. Both configurations need to be satisfied before generating a new snapshot. I have updated the KIP. > 42. One of the reasons that we added the per partition limit is to allow > each partition to make relatively even progress during the catchup phase. > This helps kstreams by potentially reducing the gap between stream time > from different partitions. If we can achieve the same thing without the > partition limit, it will be fine too. "3. For the remaining partitions send > at most the average max bytes plus the average remaining bytes." Do we > guarantee that request level max_bytes can be filled up when it can? Could > we document how we distribute the request level max_bytes to partitions in > the KIP? I want to allow some flexibility in the implementation. How about the following update to the FetchSnapshot Request Handling section: 3. Send the bytes in the snapshot from Position. If there are multiple partitions in the FetchSnapshot request, then the leader will evenly distribute the number of bytes sent across all of the partitions. The leader will not send more bytes in the response than ResponseMaxBytes, the minimum of MaxBytes in the request and the value configured in replica.fetch.response.max.bytes. a. Each topic partition is guaranteed to receive at least the average of ResponseMaxBytes if that snapshot has enough bytes remaining. b. If there are topic partitions with snapshots that have remaining bytes less than the average ResponseMaxBytes, then those bytes may be used to send snapshot bytes for other topic partitions. I should also point out that in practice for KIP-630 this request will only have one topic partition (__cluster_metadata-0). I should also point out that FetchSnapshot is sending bytes not records so there is no requirement that the response must contain at least one record like Fetch. >Also, should we change Fetch accordingly? If we want to make this change I think we should do this in another KIP. What do you think? > 46. If we don't have IBP, how do we make sure that a broker doesn't > issue FetchSnapshotRequest when the receiving broker hasn't been upgraded > yet? For a broker to send a FetchSnapshotRequest it means that it received a FetchResponse that contained a SnapshotId field. For the leader to send a SnapshotId in the FetchResponse it means that the leader is executing code that knows how to handle FetchSnapshotRequests. The inverse is also true. For the follower to receive a SnapshotId for the FetchResponse it means that it sent the FetchRequest to the leader of the __cluster_metadata-0 topic partitions. Only the KafkaRaftClient will send that fetch request. After writing the above, I see what you are saying. The broker needs to know if it should enable the KafkaRaftClient and send FetchRequests to the __cluster_metadata-0 topic partition. I think that there is also a question of how to perform a rolling migration of a cluster from ZK to KIP-500. I think we will write a future KIP that documents this process. Thanks for your help here. For now, I'll mention that we will bump the IBP. The new wording for the "Compatibility, Deprecation, and Migration Plan" section: This KIP is only implemented for the internal topic __cluster_metadata. The inter-broker protocol (IBP) will be increased to indicate that all of the brokers in the cluster support KIP-595 and KIP-630. Thanks, -Jose