Alexander Lapin created IGNITE-26532:
----------------------------------------

             Summary: Design CMG/MG absence handling logic
                 Key: IGNITE-26532
                 URL: https://issues.apache.org/jira/browse/IGNITE-26532
             Project: Ignite
          Issue Type: Task
            Reporter: Alexander Lapin


h3. Motivation

In case of
 # loss of majority in *MG* only

 # loss of majority in *CMG* only

 # loss of majority in both *CMG* and *MG*

User operations behave adequately: within the specified timeouts they attempt 
to wait for majority restoration, and if it does not happen, they fail with a 
clear error. At the same time, they do not flood the logs with tons of 
exceptions on every internal retry.

We are talking about operations such as:
 * Schema changes (e.g., creating a table).

 * Transactions of all types (with partially applied transactions being rolled 
back).

 * Adding nodes.

 * Various {{{}resetPartitions{}}}.

 * …

At the same time, user operations such as
 * stopping a node, and

 * read-only transactions (as in the past)

must complete successfully without exceptions being logged.

Internal _system_ operations must wait indefinitely for the restoration of 
majority in the corresponding system groups (whether via infinite retry or 
reactively), and under no circumstances should they trigger FG (which is what 
happens now).

A node should log reasonably little about the unavailability of a system group, 
not as excessively as it currently does.

Cancellation operations (rollback, abort, etc.) should, whenever possible, work 
even in the absence of CMG/MG. This needs to be verified separately, since it’s 
unclear if we can guarantee it for everything.

When CMG/MG is restored, the cluster should return to normal operability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to