Re: [DISCUSS] KIP-630: Kafka Raft Snapshot

Jose Garcia Sancio Thu, 01 Oct 2020 17:20:46 -0700

Comments below.

Here are the change to the KIP:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158864763&selectedPageVersions=34&selectedPageVersions=33


> 41. That's a good point. With compacted topic, the cleaning won't be done
> until the active segment rolls. With snapshots, I guess we don't have this
> restriction? So, it would be useful to guard against too frequent
> snapshotting. Does the new proposal address this completely? If the
> snapshot has only 1 record, and the new record keeps updating the same key,
> does that still cause the snapshot to be generated frequently?

That is true. In addition to metadata.snapshot.min.cleanable.ratio we
can add the following configuration:

metadata.snapshot.min.records.size - This is the minimum number of
bytes in the replicated log between the latest snapshot and the
high-watermark needed before generating a new snapshot. The default is
20MB.

Both configurations need to be satisfied before generating a new
snapshot. I have updated the KIP.

> 42. One of the reasons that we added the per partition limit is to allow
> each partition to make relatively even progress during the catchup phase.
> This helps kstreams by potentially reducing the gap between stream time
> from different partitions. If we can achieve the same thing without the
> partition limit, it will be fine too. "3. For the remaining partitions send
> at most the average max bytes plus the average remaining bytes." Do we
> guarantee that request level max_bytes can be filled up when it can? Could
> we document how we distribute the request level max_bytes to partitions in
> the KIP?

I want to allow some flexibility in the implementation. How about the
following update to the FetchSnapshot Request Handling section:

3. Send the bytes in the snapshot from Position. If there are multiple
partitions in the FetchSnapshot request, then the leader will evenly
distribute the number of bytes sent across all of the partitions. The
leader will not send more bytes in the response than ResponseMaxBytes,
the minimum of MaxBytes in the request and the value configured in
replica.fetch.response.max.bytes.
  a. Each topic partition is guaranteed to receive at least the
average of ResponseMaxBytes if that snapshot has enough bytes
remaining.
  b. If there are topic partitions with snapshots that have remaining
bytes less than the average ResponseMaxBytes, then those bytes may be
used to send snapshot bytes for other topic partitions.

I should also point out that in practice for KIP-630 this request will
only have one topic partition (__cluster_metadata-0).

I should also point out that FetchSnapshot is sending bytes not
records so there is no requirement that the response must contain at
least one record like Fetch.

>Also, should we change Fetch accordingly?

If we want to make this change I think we should do this in another
KIP. What do you think?

> 46. If we don't have IBP, how do we make sure that a broker doesn't
> issue FetchSnapshotRequest when the receiving broker hasn't been upgraded
> yet?

For a broker to send a FetchSnapshotRequest it means that it received
a FetchResponse that contained a SnapshotId field. For the leader to
send a SnapshotId in the FetchResponse it means that the leader is
executing code that knows how to handle FetchSnapshotRequests.

The inverse is also true. For the follower to receive a SnapshotId for
the FetchResponse it means that it sent the FetchRequest to the leader
of the __cluster_metadata-0 topic partitions. Only the KafkaRaftClient
will send that fetch request.

After writing the above, I see what you are saying. The broker needs
to know if it should enable the KafkaRaftClient and send FetchRequests
to the __cluster_metadata-0 topic partition. I think that there is
also a question of how to perform a rolling migration of a cluster
from ZK to KIP-500. I think we will write a future KIP that documents
this process.

Thanks for your help here. For now, I'll mention that we will bump the
IBP. The new wording for the "Compatibility, Deprecation, and
Migration Plan" section:

This KIP is only implemented for the internal topic
__cluster_metadata. The inter-broker protocol (IBP) will be increased
to indicate that all of the brokers in the cluster support KIP-595 and
KIP-630.

Thanks,
-Jose

Re: [DISCUSS] KIP-630: Kafka Raft Snapshot

Reply via email to