[
https://issues.apache.org/jira/browse/IGNITE-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Semen Boikov updated IGNITE-9803:
---------------------------------
Description:
Debugged failure of
DynamicIndexPartitionedTransactionalConcurrentSelfTest.testConcurrentRebalance
with GridDhtInvalidPartitionException, here is scenario where this error occurs:
* test starts node1, node2, loads data
* node3 is started, one partition is assigned to [node2, node3] and node3
starts rebalancing
* node4 is started, partition is re-assigned to [node2, node4]
* at this time rebalancing on node3 is in progress, it is going to handle
supply message and at this moment exchange thread moves partition to RENTING
state, and at this moment partition can not be moved to EVICTED since async
partition cleanup is needed
* thread doing rebalancing at node3 sees RENTING partition and gets
GridDhtInvalidPartitionException
Probability of such failure is very high if insert sleep(5000) in the code
doing async partition cleanup (PartitionEvictionTask.run).
I think fix for this issue is just handle GridDhtInvalidPartitionException in
GridDhtPartitionDemander.
was:
Debugged failure of
DynamicIndexPartitionedTransactionalConcurrentSelfTest.testConcurrentRebalance
with GridDhtInvalidPartitionException, here is scenario where this error occurs:
* test starts node1, node2, loads data
* node3 is started, one partition is assigned to [node2, node3] and node3
starts rebalancing
* node4 is started, partition is re-assigned to [node2, node4]
* at this time rebalancing on node3 is in progress, is is going to handles
supply message and at this moment exchange thread moves partition to RENTING
state, at this moment it can not be moved to EVICTED since async partition
cleanup is needed
* at node3 thread doing rebalancing sees RENTING partition and gets
GridDhtInvalidPartitionException
Probability of such failure is very high if insert sleep(5000) in the code
doing async partition cleanup (PartitionEvictionTask.run).
I think fix for this issue is just handle GridDhtInvalidPartitionException in
GridDhtPartitionDemander.
> GridDhtInvalidPartitionException in GridDhtPartitionDemander
> ------------------------------------------------------------
>
> Key: IGNITE-9803
> URL: https://issues.apache.org/jira/browse/IGNITE-9803
> Project: Ignite
> Issue Type: Bug
> Components: cache
> Reporter: Semen Boikov
> Assignee: Semen Boikov
> Priority: Major
> Fix For: 2.8
>
>
> Debugged failure of
> DynamicIndexPartitionedTransactionalConcurrentSelfTest.testConcurrentRebalance
> with GridDhtInvalidPartitionException, here is scenario where this error
> occurs:
> * test starts node1, node2, loads data
> * node3 is started, one partition is assigned to [node2, node3] and node3
> starts rebalancing
> * node4 is started, partition is re-assigned to [node2, node4]
> * at this time rebalancing on node3 is in progress, it is going to handle
> supply message and at this moment exchange thread moves partition to RENTING
> state, and at this moment partition can not be moved to EVICTED since async
> partition cleanup is needed
> * thread doing rebalancing at node3 sees RENTING partition and gets
> GridDhtInvalidPartitionException
> Probability of such failure is very high if insert sleep(5000) in the code
> doing async partition cleanup (PartitionEvictionTask.run).
>
> I think fix for this issue is just handle GridDhtInvalidPartitionException in
> GridDhtPartitionDemander.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)