[ https://issues.apache.org/jira/browse/IGNITE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vyacheslav Koptilin updated IGNITE-19115: ----------------------------------------- Ignite Flags: (was: Docs Required,Release Notes Required) > Possible deadlock in handling pending cache messages when the cache is > recreated > -------------------------------------------------------------------------------- > > Key: IGNITE-19115 > URL: https://issues.apache.org/jira/browse/IGNITE-19115 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.14 > Reporter: Vyacheslav Koptilin > Assignee: Vyacheslav Koptilin > Priority: Major > > Let's consider the following scenario: > Precondition: > there is a cluster of two server nodes (node A - coordinator, and node B) > and an atomic cache that resides on that nodes. > current topology version is (x, y) > Node B initiates putting a new key-value pair into the atomic cache. Let's > assume the primary partition, which belongs to the key, resides on node A. > The previous step requires acquiring a gateway lock for the corresponding > cache (GridCacheGateway read lock) and registering > GridNearAtomicSingleUpdateFuture into the MVCC managerIt is important to > note, that cache future does not acquire topology lock and so should not > block PME > Concurrently, node A initiates destroying the cache. Corresponding PME will > be successfully completed on the coordinator node and blocked on node B just > because the gateway is already acquired > {code:java} > "sys-#274%SAGE server%" #338 prio=5 os_prio=0 tid=0x00007faf4b2a1000 > nid=0x114e5 sleeping[0x00007fa8d9ac5000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:8448) > at > org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:323) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2617) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$a1367cb0$1(GridCacheProcessor.java:2818) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor$$Lambda$1880/434639048.apply(Unknown > Source) > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11913) > at > org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11815) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2793) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2942) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2530) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4829) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:160) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4498) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4486) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:470) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:363) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4486) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:2029) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:474) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:461) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3819) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3798) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1150) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1334) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4800(GridIoManager.java:158) > at > org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1218) > at > org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > Node A initiates creating a new cache with the same name as previously > destroyed. > Node A received a cache update message but it cannot be processed, because a > new cache (cache with the same cacheId) is starting, so, the processing of > this message should be postponed until PME is completed (In this case the > GridDhtForceKeysFuture is created, and the message will not be processed > until PME is completed. So, the near node will not receive a response and it > will not be able to complete the previous exchange future. see IGNITE-10251). > new PME on node B cannot proceed further just because of 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)