Hi all,

Here at Dropbox we're still (off and on) trying to get to the bottom of
the data loss that's been hitting our largest cluster during preferred
replica elections. It unfortunately has repeated a few times, so now we
have a question.

To make sure we're understanding, message commit status (replication)
basically goes through three phases:

* Messages Uncommitted; the leader has received new production, and the
followers receive the messages in a Fetch request. Everybody's LEO is
incremented, but HWMs are unchanged. No messages are considered
committed.

* Leader Committed; in the subsequent Fetch, the follower says "my LEO
is now X" and the leader records that and then updates its HWM to be the
minimum of all follower's LEO. After this stage, the messages are
committed -- only on the leader. The follower's HWM is still in the
past.

* Leader+Follower Committed; or HWM Replication Fetch; in this final
fetch, the follower is informed of the new HWM by the leader and
increments its own HWM accordingly.

These aren't actually distinct phases of course, they're just part of
individual fetch requests, but I think logically the HWM transitions can
be thought of in those way. I.e., "message uncommitted", "leader
committed", "leader+follower committed".

Is this understanding accurate?

If so, is it just a fact that (right now) a preferred replica election
has a small change of electing a follower and causing message loss of
any "leader committed" messages (i.e., messages that are not considered
committed on the follower that is now getting promoted)?

We can't find anything in the protocol that would guard against this.
I've also been reading KIP-101 and it looks like this is being referred
to sort-of in Scenario 1, however, that scenario is mentioning broker
failure -- and my concern is that data loss is possible even in the
normal scenario with no broker failures.

Any thoughts?

-- 
Mark Smith
m...@qq.is

Reply via email to