[jira] [Comment Edited] (IGNITE-23252) ItReplicaLifecycleTest is unstable

Mikhail Efremov (Jira) Thu, 23 Jan 2025 10:17:33 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-23252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916445#comment-17916445
 ]


Mikhail Efremov edited comment on IGNITE-23252 at 1/23/25 6:16 PM:
-------------------------------------------------------------------

The issue showed up a critical race for old colocation code base:
# There are still needed table replication group for some transaction related 
requests (e.g. {{TxFinishReplicaRequestImpl}} that triggers 
{{WriteIntentSwitchReplicaRequestImpl}} that requires table replica now).
# So we have to left table replicas related code yet.
# Then we have a race between table replicas assignments stable switch and zone 
replica stable switch destruction:
* table replica stopping and partition storages destruction are triggered by 
table assignmets switch;
* zone replica on zone stable assignemtns switch fires an event to destroy the 
corresponding table replica's storages highly likely before table replica was 
stopped;
* this race leads to "no such file" exception due to wrong order of a table 
replica stopping process.

We want a correct solution that starts the table replica on 
{{AFTER_REPLICA_STARTED}} event and {{TableManager}}'s assignments events are 
blocked if colocation flag is enabled. End the corresponding table replication 
group is stopped with storages in case {{AFTER_REPLICA_STOPPED}} event. The 
first requires forced assignments for {{weakReplcaStart}} call.


was (Author: JIRAUSER303791):
The issue showed up a critical race for old colocation code base:
# There are still needed table replication group for some transaction related 
request (e.g. {{TxFinishReplicaRequestImpl}} that triggers 
{{WriteIntentSwitchReplicaRequestImpl}} that requires table replica now).
# So we have to left table replicas related code yet.
# Then we have a race between table replicas assignments stable switch and zone 
replica stable switch destruction:
* table replica stopping and partition storages destruction are triggered by 
table assignmets switch;
* zone replica on zone stable assignemtns switch fires an event to destroy the 
corresponding table replica's storages highly likely before table replica was 
stopped;
* this race leads to "no such file" exception due to wrong order of a table 
replica stopping process.

We want a correct solution that starts the table replica on 
{{AFTER_REPLICA_STARTED}} event and {{TableManager}}'s assignments events are 
blocked if colocation flag is enabled. End the corresponding table replication 
group is stopped with storages in case {{AFTER_REPLICA_STOPPED}} event. The 
first requires forced assignments for {{weakReplcaStart}} call.

> ItReplicaLifecycleTest is unstable
> ----------------------------------
>
>                 Key: IGNITE-23252
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23252
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Mikhail Efremov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Various of different exceptions, e.g., [TC 
> failure.|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/8490892?expandBuildDeploymentsSection=false&hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildTestsSection=true&expandCode+Inspection=true]
>  Stabilization required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-23252) ItReplicaLifecycleTest is unstable

Reply via email to