[jira] [Updated] (KAFKA-9733) Consider addition of leader quorum in replication model

Richard Yu (Jira) Wed, 18 Mar 2020 09:24:26 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-9733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Richard Yu updated KAFKA-9733:
------------------------------
    Description: 
Kafka's current replication model (with its single leader and several 
followers) is somewhat similar to the current consensus algorithms being used 
in databases (RAFT) with the major difference being the existence of the ISR. 
Consequently, Kafka suffers from the same fault tolerance issues as does other 
distributed systems which rely on RAFT: the leader tends to be the chokepoint 
for failures i.e. if it goes down, it will have a brief stop-the-world effect. 

In contrast, giving all replicas the power to write and read to other replicas 
is also difficult to accomplish (as emphasized by the complexity of the 
Egalitarian Paxos algorithm), since consistency is so hard to maintain in such 
an algorithm, plus very little gain compared to the overhead. 

Therefore, I propose that we have an intermediate step in between these two 
algorithms, and that is the leader replica quorum. In essence, there will be 
multiple leaders (which have the power for both read and writes), but the 
number of leaders will not be excessive (i.e. maybe three or four at max). How 
we achieve consistency is simple:
 * Any leader has the power to propose a write update to other replicas. But 
before passing a write update to a follower, the other leaders must elect if 
such an operation is granted.
 * In principle, a leader will propose a write update to the other leaders, and 
once the other leaders have integrated that write update into their version of 
the stored data, they will also give the green light. 
 * If say, more than half the other leaders have agreed that the current change 
is good to go, then we can 

 

 

 

  was:
Note: Description still not finished. Still not sure if this is needed.

Kafka's current replication model (with its single leader and several 
followers) is somewhat similar to the current consensus algorithms being used 
in databases (RAFT) with the major difference being the existence of the ISR. 
Consequently, Kafka suffers from the same fault tolerance issues as does other 
distributed systems which rely on RAFT: the leader tends to be the chokepoint 
for failures i.e. if it goes down, it will have a brief stop-the-world effect. 

In contrast, giving all replicas the power to write and read to other replicas 
is also difficult to accomplish (as emphasized by the complexity of the 
Egalitarian Paxos algorithm), since consistency is so hard to maintain in such 
an algorithm, plus very little gain compared to the overhead. 

Therefore, I propose that we have an intermediate step in between these two 
algorithms, and that is the leader partition quorum.

 


> Consider addition of leader quorum in replication model
> -------------------------------------------------------
>
>                 Key: KAFKA-9733
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9733
>             Project: Kafka
>          Issue Type: New Feature
>          Components: clients, core
>            Reporter: Richard Yu
>            Priority: Minor
>
> Kafka's current replication model (with its single leader and several 
> followers) is somewhat similar to the current consensus algorithms being used 
> in databases (RAFT) with the major difference being the existence of the ISR. 
> Consequently, Kafka suffers from the same fault tolerance issues as does 
> other distributed systems which rely on RAFT: the leader tends to be the 
> chokepoint for failures i.e. if it goes down, it will have a brief 
> stop-the-world effect. 
> In contrast, giving all replicas the power to write and read to other 
> replicas is also difficult to accomplish (as emphasized by the complexity of 
> the Egalitarian Paxos algorithm), since consistency is so hard to maintain in 
> such an algorithm, plus very little gain compared to the overhead. 
> Therefore, I propose that we have an intermediate step in between these two 
> algorithms, and that is the leader replica quorum. In essence, there will be 
> multiple leaders (which have the power for both read and writes), but the 
> number of leaders will not be excessive (i.e. maybe three or four at max). 
> How we achieve consistency is simple:
>  * Any leader has the power to propose a write update to other replicas. But 
> before passing a write update to a follower, the other leaders must elect if 
> such an operation is granted.
>  * In principle, a leader will propose a write update to the other leaders, 
> and once the other leaders have integrated that write update into their 
> version of the stored data, they will also give the green light. 
>  * If say, more than half the other leaders have agreed that the current 
> change is good to go, then we can 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-9733) Consider addition of leader quorum in replication model

Reply via email to