[ 
https://issues.apache.org/jira/browse/IGNITE-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634809#comment-17634809
 ] 

Andrey Mashenkov edited comment on IGNITE-18171 at 11/18/22 10:13 AM:
----------------------------------------------------------------------

The scenarios we would like to cover is cartesian product of
 # Nodes' roles combinations in grid and the way we get into the state (via 
start or stop a node)
 # User actions that we want to check at each scenario' step.
 ** RO transaction operation. This requires at least one follower.
 ** RW transaction operation. This requires quorum (leader)
 ** DDL operation. E.g. create table in available distribution zone as well as 
non-available distribution zone. This requires Metastorage quorum and maybe 
distribution zone leader.
 ** Stop existed node. Changing logical topology requires CMG quorum.
 ** Start new (non-initialized) node. CMG or CMG+MetaStore quorum?
(Start initialized node is covered by restart scenario).
 ** Start initialized node with different cluster tag. Should never accepted 
for join.
 ** -Some distributed operation that requires no quorum. e.g. metrics 
enable/disable?-

NB: We can get into the same grid state from different previous states via 
different action (start/stop node), e.g. AB -> ABC by adding node C and AC -> 
ABC by adding node B. It is ok. We want to perform all the checks on ABC after 
both transitions to check recovery correctness of C and B quorums.

NB: Some combinations may have no sense and might be excluded. E.g. DDL 
operation on some steps of "grid startup" scenarios, when CMG is not available 
yet, because there is no entry point (e.g. node instance) to start the 
operation.

NB: DNG unavailability implies the expectations for transactional operations 
over persistent and in-memory tables might be different.


was (Author: amashenkov):
The scenarios we would like to cover is cartesian product of
 # Nodes' roles combinations in grid and the way we get into the state (via 
start or stop a node)
 # User actions that we want to check at each scenario' step.
 ** RO transaction operation. This requires at least one follower.
 ** RW transaction operation. This requires quorum (leader)
 ** DDL operation. E.g. create table in available distribution zone as well as 
non-available distribution zone. This requires Metastorage quorum and maybe 
distribution zone leader.
 ** Stop existed node. Changing logical topology requires CMG quorum.
 ** Start new (non-initialized) node. CMG or CMG+MetaStore quorum?
(Start initialized node is covered by restart scenario).
 ** Start initialized node with different cluster tag. Should never accepted 
for join.
 ** -Some distributed operation that requires no quorum. e.g. metrics 
enable/disable?-

NB: Some combinations may have no sense and might be excluded. E.g. DDL 
operation on some steps of "grid startup" scenarios, when CMG is not available 
yet, because there is no entry point (e.g. node instance) to start the 
operation.

 

NB: DNG unavailability implies the expectations for transactional operations 
over persistent and in-memory tables might be different.

> Descibe nodes start/stop scenarios
> ----------------------------------
>
>                 Key: IGNITE-18171
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18171
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Andrey Mashenkov
>            Assignee: Andrey Mashenkov
>            Priority: Major
>              Labels: ignite-3
>
> h2. Definitions.
> We can distinguish next cluster node groups, see below. Each node may be part 
> of one or more groups.
>  * Cluster Management Group (CMG), that control new nodes join process.
>  * MetaStorage group (MSG), that hosts meta storage.
>  * Data node group (DNG), that just hosts tables partitions.
> The components (CMG, meta storage, tables components) are depends on each 
> other, but may resides on different (even disjoint) node subsets. So, some 
> components may become temporary unavailable, and dependant components must be 
> aware of such issues and handle them (wait, retry, throw exception or 
> whatever) in expected way, which has to be documented also.
> [See IEP for 
> details|https://cwiki.apache.org/confluence/display/IGNITE/IEP-77%3A+Node+Join+Protocol+and+Initialization+for+Ignite+3]
> h2. Motivation.
> As of now, the correct way to start the grid (after it was stopped) is: start 
> CMG nodes, then Meta Storage nodes, then Data nodes. And in backward order 
> for correct stop. Other scenarios are not tested and may lead to unexpected 
> behaviour.
> Let's describe all possible scenarios, expected behaviour for each of them 
> and extend test coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to