[ 
https://issues.apache.org/jira/browse/IGNITE-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-24772:
----------------------------------
    Description: 
*Scenario:*

Nodes: A,B,C. 

A is a leader.

Client writes some data, data is replicated to A and B, committed on the leader 
(A) and the write operation succeeds from client's POV.

A fails, then returns to the cluster. No data saved on A because it is 
in-memory.

The cluster tries to include A as a clean node, it tries to exclude it from the 
configuration and include again, but configuration is not applied for some time 
because there is no leader and may be some temporary network issues preventing 
the write of new data.

Then the user (that thinks that the majority would be preserved) restarts node 
B. It also loses the data.

Let's say that data wasn't even replicated on C.

As a result, the data is lost.

 

*Ignite specifics:*

Before starting the node during the restart, it is removed from the 
configuration and then included again. Actually, it is started only when 
including it back. So the scenario will be slightly different:

When A is started, it is removed from the configuration.

Node B is stopped. Now the majority is lost and full group restart is required.

User will need a group restart, while keeping {*}the majority of Ignite nodes 
online{*}. It leads to the data loss.

  was:
*Scenario:*

Nodes: A,B,C. 

A is a leader.

Client writes some data, data is replicated to A and B, committed on the leader 
(A) and the write operation succeeds from client's POV.

A fails, then returns to the cluster. No data saved on A because it is 
in-memory.

The cluster tries to include A as a clean node, it tries to exclude it from the 
configuration and include again, but configuration is not applied for some time 
because there is no leader and may be some temporary network issues preventing 
the write of new data.

Then the user (that thinks that the majority would be preserved) restarts node 
B. It also loses the data.

Let's say that data wasn't even replicated on C.

As a result, the data is lost.


> Data loss in in-memory group after several node restarts without losing 
> majority in any moment of time
> ------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-24772
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24772
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> *Scenario:*
> Nodes: A,B,C. 
> A is a leader.
> Client writes some data, data is replicated to A and B, committed on the 
> leader (A) and the write operation succeeds from client's POV.
> A fails, then returns to the cluster. No data saved on A because it is 
> in-memory.
> The cluster tries to include A as a clean node, it tries to exclude it from 
> the configuration and include again, but configuration is not applied for 
> some time because there is no leader and may be some temporary network issues 
> preventing the write of new data.
> Then the user (that thinks that the majority would be preserved) restarts 
> node B. It also loses the data.
> Let's say that data wasn't even replicated on C.
> As a result, the data is lost.
>  
> *Ignite specifics:*
> Before starting the node during the restart, it is removed from the 
> configuration and then included again. Actually, it is started only when 
> including it back. So the scenario will be slightly different:
> When A is started, it is removed from the configuration.
> Node B is stopped. Now the majority is lost and full group restart is 
> required.
> User will need a group restart, while keeping {*}the majority of Ignite nodes 
> online{*}. It leads to the data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to