Re: [DISCUSS] KIP-630: Kafka Raft Snapshot

Jose Garcia Sancio Thu, 24 Sep 2020 13:20:09 -0700

Thanks for the feedback Jason.

I have made the following changes to the KIP:
1. Better explanation of how followers will manage snapshots and the
replicated log. This includes the necessary changes when granting or
requesting votes.
2. How the Snapshot's epoch will be used for the LastFetchEpoch in the
Fetch request.
3. New configuration options.
4. Changed the Fetch response to match the latest changes in KIP-595.
5. Changed the FetchSnapshot request to include total response max bytes.
6. Changed the FetchSnapshot response to return the snapshot size
instead of the "Continue" field.


Diff:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158864763&selectedPageVersions=24&selectedPageVersions=23

> 1. There is a comment in the proposal which suggests that we will maintain
> multiple snapshots:
>
> > Having multiple snapshots is useful for minimizing re-fetching of the
> snapshot when a new snapshot is generated.
>
> However, the document later says that snapshots get deleted as the LBO
> advances. Just wanted to clarify the intent. Will we generally only have
> one snapshot?

Generally the number of snapshots on disk will be one. I suspect that
users will want some control over this. We can add a configuration
option that doesn't delete, or advances the log begin offset past, the
N latest snapshots. We can set the default value for this
configuration to two. What do you think?

> 2. The proposal says the following:
>
> > During leader election, followers with incomplete or missing snapshot
> will send a vote request and response as if they had an empty log.
>
> Maybe you can help me understand the scenario we're talking about since I'm
> not sure I understand the point of this. If the intent is to not allow such
> a follower to become leader, why would it ever become a candidate? On the
> other hand, if the intent is to still allow it to become leader in some
> disaster scenario, then why would it not use its latest log state? For
> inbound Vote requests, I think it should definitely still consider its
> latest log state when deciding whether to grant a vote.

Conceptually followers will implement this algorithm:
1. Follower sends fetch request
2. Leader replies with snapshot epoch and offset
3. Follower pauses fetch
4. Follower fetches the snapshot
5. Follower resume fetch by
    A. Setting the LBO to the snapshot offset plus one
    B. Setting the LEO or fetch offset in the fetch request to the
snapshot offset plus one
    C. Uses the snapshot epoch as the last fetched epoch in the fetch request.

The problem I was trying to address is what is the state of the
follower between bullet 4 and 5? Let's assume that the snapshot fetch
in bullet 4 has an epoch of E and an offset of O. The follower can
have the following state on disk after bullet 4:

1. A snapshot with offset O and epoch E.
2. Many snapshots older/less than offset O and epoch E.
3. A replicated log with LEO older/less than offset O and epoch E.

In this case when the follower grants a vote or becomes a candidate it
should use the latest of all of this which is (1.) the fetched
snapshot with offset O and epoch E.

I updated the KIP to include this description.

> 3. Are we overloading `replica.fetch.max.bytes` for snapshot fetches as
> well? It looks like we are specifying this at the partition level, but it
> might be more useful to track the maximum bytes at the request level. On a
> related note, it might be useful to think through heuristics for balancing
> between the requests in a partition. Unlike fetches, it seems like we'd
> want to complete snapshot loading partition by partition. I wonder if it
> would be simpler for FetchSnapshot to handle just one partition.

We could use the same configuration we have for Fetch but to avoid
confusion let's add two more configurations for
"replica.fetch.snapshot.max.bytes" and
"replica.fetch.snapshot.response.max.bytes".

> 4. It would help if the document motivated the need to track the snapshot
> epoch. Since we are only snapshotting below the high watermark, are you
> thinking about recovering from data loss scenarios?

I added the following paragraph to the KIP:

The snapshot epoch will be used when ordering snapshots and more
importantly when setting the LastFetchedEpoch in the Fetch request. It
is possible for a follower to have a snapshot and an empty log. In
this case the follower will use the epoch of the snapshot when setting
the LastFetchEpoch in the Fetch request.

>
> 5. Might need to fix the following:
>
> > Otherwise, the leader will respond with the offset and epoch of the
> latest snapshot (y, c) and with the next fetch offset and epoch (y + 1, d)
>
> We ended up renaming the next fetch offset and epoch. I think we should
> just leave it empty in this case. The snapshot offset and epoch seem
> sufficient.

Done. I made some changes to the "Handling Fetch Response" section too.

Re: [DISCUSS] KIP-630: Kafka Raft Snapshot

Reply via email to