[
https://issues.apache.org/jira/browse/KAFKA-9733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Yu updated KAFKA-9733:
------------------------------
Description:
Kafka's current replication model (with its single leader and several
followers) is somewhat similar to the current consensus algorithms being used
in databases (RAFT) with the major difference being the existence of the ISR.
Consequently, Kafka suffers from the same fault tolerance issues as does other
distributed systems which rely on RAFT: the leader tends to be the chokepoint
for failures i.e. if it goes down, it will have a brief stop-the-world effect.
In contrast, giving all replicas the power to write and read to other replicas
is also difficult to accomplish (as emphasized by the complexity of the
Egalitarian Paxos algorithm), since consistency is so hard to maintain in such
an algorithm, plus very little gain compared to the overhead.
Therefore, I propose that we have an intermediate step in between these two
algorithms, and that is the leader replica quorum. In essence, there will be
multiple leaders (which have the power for both read and writes), but the
number of leaders will not be excessive (i.e. maybe three or four at max). How
we achieve consistency is simple:
* Any leader has the power to propose a write update to other replicas. But
before passing a write update to a follower, the other leaders must elect if
such an operation is granted.
* In principle, a leader will propose a write update to the other leaders, and
once the other leaders have integrated that write update into their version of
the stored data, they will also give the green light.
* If say, more than half the other leaders have agreed that the current change
is good to go, then we can
was:
Note: Description still not finished. Still not sure if this is needed.
Kafka's current replication model (with its single leader and several
followers) is somewhat similar to the current consensus algorithms being used
in databases (RAFT) with the major difference being the existence of the ISR.
Consequently, Kafka suffers from the same fault tolerance issues as does other
distributed systems which rely on RAFT: the leader tends to be the chokepoint
for failures i.e. if it goes down, it will have a brief stop-the-world effect.
In contrast, giving all replicas the power to write and read to other replicas
is also difficult to accomplish (as emphasized by the complexity of the
Egalitarian Paxos algorithm), since consistency is so hard to maintain in such
an algorithm, plus very little gain compared to the overhead.
Therefore, I propose that we have an intermediate step in between these two
algorithms, and that is the leader partition quorum.
> Consider addition of leader quorum in replication model
> -------------------------------------------------------
>
> Key: KAFKA-9733
> URL: https://issues.apache.org/jira/browse/KAFKA-9733
> Project: Kafka
> Issue Type: New Feature
> Components: clients, core
> Reporter: Richard Yu
> Priority: Minor
>
> Kafka's current replication model (with its single leader and several
> followers) is somewhat similar to the current consensus algorithms being used
> in databases (RAFT) with the major difference being the existence of the ISR.
> Consequently, Kafka suffers from the same fault tolerance issues as does
> other distributed systems which rely on RAFT: the leader tends to be the
> chokepoint for failures i.e. if it goes down, it will have a brief
> stop-the-world effect.
> In contrast, giving all replicas the power to write and read to other
> replicas is also difficult to accomplish (as emphasized by the complexity of
> the Egalitarian Paxos algorithm), since consistency is so hard to maintain in
> such an algorithm, plus very little gain compared to the overhead.
> Therefore, I propose that we have an intermediate step in between these two
> algorithms, and that is the leader replica quorum. In essence, there will be
> multiple leaders (which have the power for both read and writes), but the
> number of leaders will not be excessive (i.e. maybe three or four at max).
> How we achieve consistency is simple:
> * Any leader has the power to propose a write update to other replicas. But
> before passing a write update to a follower, the other leaders must elect if
> such an operation is granted.
> * In principle, a leader will propose a write update to the other leaders,
> and once the other leaders have integrated that write update into their
> version of the stored data, they will also give the green light.
> * If say, more than half the other leaders have agreed that the current
> change is good to go, then we can
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)