Alexander Lapin created IGNITE-26532: ----------------------------------------
Summary: Design CMG/MG absence handling logic Key: IGNITE-26532 URL: https://issues.apache.org/jira/browse/IGNITE-26532 Project: Ignite Issue Type: Task Reporter: Alexander Lapin h3. Motivation In case of # loss of majority in *MG* only # loss of majority in *CMG* only # loss of majority in both *CMG* and *MG* User operations behave adequately: within the specified timeouts they attempt to wait for majority restoration, and if it does not happen, they fail with a clear error. At the same time, they do not flood the logs with tons of exceptions on every internal retry. We are talking about operations such as: * Schema changes (e.g., creating a table). * Transactions of all types (with partially applied transactions being rolled back). * Adding nodes. * Various {{{}resetPartitions{}}}. * … At the same time, user operations such as * stopping a node, and * read-only transactions (as in the past) must complete successfully without exceptions being logged. Internal _system_ operations must wait indefinitely for the restoration of majority in the corresponding system groups (whether via infinite retry or reactively), and under no circumstances should they trigger FG (which is what happens now). A node should log reasonably little about the unavailability of a system group, not as excessively as it currently does. Cancellation operations (rollback, abort, etc.) should, whenever possible, work even in the absence of CMG/MG. This needs to be verified separately, since it’s unclear if we can guarantee it for everything. When CMG/MG is restored, the cluster should return to normal operability. -- This message was sent by Atlassian Jira (v8.20.10#820010)