[ 
https://issues.apache.org/jira/browse/IGNITE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-19115:
-----------------------------------------
    Ignite Flags:   (was: Docs Required,Release Notes Required)

> Possible deadlock in handling pending cache messages when the cache is 
> recreated
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-19115
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19115
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.14
>            Reporter: Vyacheslav Koptilin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>
> Let's consider the following scenario:
>   Precondition:
>     there is a cluster of two server nodes (node A - coordinator, and node B) 
> and an atomic cache that resides on that nodes.
>     current topology version is (x, y)
> Node B initiates putting a new key-value pair into the atomic cache. Let's 
> assume the primary partition, which belongs to the key, resides on node A.
> The previous step requires acquiring a gateway lock for the corresponding 
> cache (GridCacheGateway read lock) and registering 
> GridNearAtomicSingleUpdateFuture into the MVCC managerIt is important to 
> note, that cache future does not acquire topology lock and so should not 
> block PME
> Concurrently, node A initiates destroying the cache. Corresponding PME will 
> be successfully completed on the coordinator node and blocked on node B just 
> because the gateway is already acquired
> {code:java}
> "sys-#274%SAGE server%" #338 prio=5 os_prio=0 tid=0x00007faf4b2a1000 
> nid=0x114e5 sleeping[0x00007fa8d9ac5000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:8448)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:323)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2617)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$a1367cb0$1(GridCacheProcessor.java:2818)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor$$Lambda$1880/434639048.apply(Unknown
>  Source)
>       at 
> org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11913)
>       at 
> org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11815)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2793)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2942)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2530)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4829)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:160)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4498)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4486)
>       at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:470)
>       at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:363)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4486)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:2029)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:474)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:461)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3819)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3798)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1150)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1334)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4800(GridIoManager.java:158)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1218)
>       at 
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
>       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>       at java.lang.Thread.run(Unknown Source)
> {code}
> Node A initiates creating a new cache with the same name as previously 
> destroyed.
> Node A received a cache update message but it cannot be processed, because a 
> new cache (cache with the same cacheId) is starting, so, the processing of 
> this message should be postponed until PME is completed (In this case the 
> GridDhtForceKeysFuture is created, and the message will not be processed 
> until PME is completed. So, the near node will not receive a response and it 
> will not be able to complete the previous exchange future. see IGNITE-10251).
> new PME on node B cannot proceed further just because of 3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to