[ 
https://issues.apache.org/jira/browse/IGNITE-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944758#comment-17944758
 ] 

Roman Puchkovskiy commented on IGNITE-25142:
--------------------------------------------

There is a deadlock between PartitionReplicaLifecycleManager and TableManager 
which manifests itself on node start.
 # PRLM starts a zone X; by doing this, it takes a write lock on the zone
 # TableManager attemtps to start tables of node X, and for doing so it has to 
take a read lock on zone X, so it is blocked by PRLM
 # But replica starts cannot be initiated until all tables are started, so 
TableManager holds PRLM, resulting in a deadlock

The deadlock can be eliminated if we don't acquire zone write locks on node 
start in PRLM. These locks are not needed at node start as another mutual 
exclusion mechanism (namely that 'first start tables, only then start 
replicas') is in place.

> ItZoneDataReplicationTest.testLocalRaftLogReapplication is flaky
> ----------------------------------------------------------------
>
>                 Key: IGNITE-25142
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25142
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to