[ 
https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-19238:
-------------------------------------
    Description: 
1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed 
to stop replicas on node stop:

!Снимок экрана от 2023-04-06 10-39-32.png!

 
{code:java}
java.lang.AssertionError: There are replicas alive 
[replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, 
b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, 
b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, 
b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, 
b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, 
b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]]
    at 
org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341)
    at 
org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at 
org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131)
    at 
org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code}
 

2. The reason why we failed to stop replicas is the race between 
tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. 

2.1 On TableManager stop, we stop and cleanup all table resources like replicas 
and raft nodes
{code:java}
public void stop() {
  ...
  Map<UUID, TableImpl> tables = tablesByIdVv.latest();  // 1*
  cleanUpTablesResources(tables); 
  cleanUpTablesResources(tablesToStopInCaseOfError);
  ...
}{code}
where tablesToStopInCaseOfError is a sort of pending tables list which one is 
cleared on cfg storage revision update. 

*!* tablesByIdVv listens same storage revision update event in order to publish 
tables related to the given revision or in other words make such tables 
accessible from tablesByIdVv.latest(); that one that is used in order to 
retrieve tables for cleanup on components stop (see // 1* above)
{code:java}
public TableManager(
  ... 
  tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new);

  registry.accept(token -> {
    tablesToStopInCaseOfError.clear();
    
    return completedFuture(null);
  });
  {code}
However inside IncrementalVersionedValue we have async storageRevision update 
processing

 

2.2 So that, we have following flow that touches tablesToStopInCaseOfError, 
tablesByIdVv

onCreateTable

  was:It


> ItDataTypesTest is flaky
> ------------------------
>
>                 Key: IGNITE-19238
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19238
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>         Attachments: Снимок экрана от 2023-04-06 10-39-32.png
>
>
> 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests 
> failed to stop replicas on node stop:
> !Снимок экрана от 2023-04-06 10-39-32.png!
>  
> {code:java}
> java.lang.AssertionError: There are replicas alive 
> [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]]
>     at 
> org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341)
>     at 
> org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133)
>     at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>     at 
> org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131)
>     at 
> org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code}
>  
> 2. The reason why we failed to stop replicas is the race between 
> tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. 
> 2.1 On TableManager stop, we stop and cleanup all table resources like 
> replicas and raft nodes
> {code:java}
> public void stop() {
>   ...
>   Map<UUID, TableImpl> tables = tablesByIdVv.latest();  // 1*
>   cleanUpTablesResources(tables); 
>   cleanUpTablesResources(tablesToStopInCaseOfError);
>   ...
> }{code}
> where tablesToStopInCaseOfError is a sort of pending tables list which one is 
> cleared on cfg storage revision update. 
> *!* tablesByIdVv listens same storage revision update event in order to 
> publish tables related to the given revision or in other words make such 
> tables accessible from tablesByIdVv.latest(); that one that is used in order 
> to retrieve tables for cleanup on components stop (see // 1* above)
> {code:java}
> public TableManager(
>   ... 
>   tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new);
>   registry.accept(token -> {
>     tablesToStopInCaseOfError.clear();
>     
>     return completedFuture(null);
>   });
>   {code}
> However inside IncrementalVersionedValue we have async storageRevision update 
> processing
>  
> 2.2 So that, we have following flow that touches tablesToStopInCaseOfError, 
> tablesByIdVv
> onCreateTable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to