[ https://issues.apache.org/jira/browse/IGNITE-23252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916445#comment-17916445 ]
Mikhail Efremov edited comment on IGNITE-23252 at 1/23/25 6:16 PM: ------------------------------------------------------------------- The issue showed up a critical race for old colocation code base: # There are still needed table replication group for some transaction related requests (e.g. {{TxFinishReplicaRequestImpl}} that triggers {{WriteIntentSwitchReplicaRequestImpl}} that requires table replica now). # So we have to left table replicas related code yet. # Then we have a race between table replicas assignments stable switch and zone replica stable switch destruction: * table replica stopping and partition storages destruction are triggered by table assignmets switch; * zone replica on zone stable assignemtns switch fires an event to destroy the corresponding table replica's storages highly likely before table replica was stopped; * this race leads to "no such file" exception due to wrong order of a table replica stopping process. We want a correct solution that starts the table replica on {{AFTER_REPLICA_STARTED}} event and {{TableManager}}'s assignments events are blocked if colocation flag is enabled. End the corresponding table replication group is stopped with storages in case {{AFTER_REPLICA_STOPPED}} event. The first requires forced assignments for {{weakReplcaStart}} call. was (Author: JIRAUSER303791): The issue showed up a critical race for old colocation code base: # There are still needed table replication group for some transaction related request (e.g. {{TxFinishReplicaRequestImpl}} that triggers {{WriteIntentSwitchReplicaRequestImpl}} that requires table replica now). # So we have to left table replicas related code yet. # Then we have a race between table replicas assignments stable switch and zone replica stable switch destruction: * table replica stopping and partition storages destruction are triggered by table assignmets switch; * zone replica on zone stable assignemtns switch fires an event to destroy the corresponding table replica's storages highly likely before table replica was stopped; * this race leads to "no such file" exception due to wrong order of a table replica stopping process. We want a correct solution that starts the table replica on {{AFTER_REPLICA_STARTED}} event and {{TableManager}}'s assignments events are blocked if colocation flag is enabled. End the corresponding table replication group is stopped with storages in case {{AFTER_REPLICA_STOPPED}} event. The first requires forced assignments for {{weakReplcaStart}} call. > ItReplicaLifecycleTest is unstable > ---------------------------------- > > Key: IGNITE-23252 > URL: https://issues.apache.org/jira/browse/IGNITE-23252 > Project: Ignite > Issue Type: Bug > Reporter: Alexander Lapin > Assignee: Mikhail Efremov > Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > Various of different exceptions, e.g., [TC > failure.|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/8490892?expandBuildDeploymentsSection=false&hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildTestsSection=true&expandCode+Inspection=true] > Stabilization required. -- This message was sent by Atlassian Jira (v8.20.10#820010)