[ https://issues.apache.org/jira/browse/IGNITE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Daschinsky updated IGNITE-19115: ------------------------------------- Fix Version/s: 2.15 > Possible deadlock in handling pending cache messages when the cache is > recreated > -------------------------------------------------------------------------------- > > Key: IGNITE-19115 > URL: https://issues.apache.org/jira/browse/IGNITE-19115 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.14 > Reporter: Vyacheslav Koptilin > Assignee: Vyacheslav Koptilin > Priority: Major > Fix For: 2.15 > > Time Spent: 20m > Remaining Estimate: 0h > > Let's consider the following scenario: > Precondition: > there is a cluster of two server nodes (node A - coordinator, and node B) > and an atomic cache that resides on that nodes. > current topology version is (x, y) > Node B initiates putting a new key-value pair into the atomic cache. Let's > assume the primary partition, which belongs to the key, resides on node A. > The previous step requires acquiring a gateway lock for the corresponding > cache (GridCacheGateway read lock) and registering > GridNearAtomicSingleUpdateFuture into the MVCC manager. It is important to > note, that cache future does not acquire topology lock and so should not > block PME > Concurrently, node A initiates destroying the cache. Corresponding PME will > be successfully completed on the coordinator node and blocked on node B just > because the gateway is already acquired > {noformat} > Thread [name="sys-#105%dht.IgniteCacheRecreateTest1%", id=123, > state=TIMED_WAITING, blockCnt=0, waitCnt=350] > at java.lang.Thread.sleep(Native Method) > at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:8316) > at > o.a.i.i.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:324) > at > o.a.i.i.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2582) > at > o.a.i.i.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$1c59e5cf$1(GridCacheProcessor.java:2776) > at > o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$714/770930142.apply(Unknown > Source) > at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11628) > at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11530) > at > o.a.i.i.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2755) > at > o.a.i.i.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2945) > at > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2528) > at > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4785) > at > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:161) > at > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4453) > at > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4441) > at > o.a.i.i.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:464) > at > o.a.i.i.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355) > at > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4441) > at > o.a.i.i.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1991) > at > o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:469) > at > o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:454) > at > o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3765) > at > o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3744) > at > o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151) > at > o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592) > at > o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393) > at > o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319) > at > o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110) > at > o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309) > at > o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907) > at > o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528) > at > o.a.i.i.managers.communication.GridIoManager.access$5300(GridIoManager.java:243) > at > o.a.i.i.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421) > at > o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:55) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {noformat} > Node A initiates creating a new cache with the same name as previously > destroyed. > Node A received a cache update message but it cannot be processed, because a > new cache (cache with the same cacheId) is starting, so, the processing of > this message should be postponed until PME is completed (In this case the > GridDhtForceKeysFuture is created, and the message will not be processed > until PME is completed. So, the near node will not receive a response and it > will not be able to complete the previous exchange future. see IGNITE-10251). > new PME on node B cannot proceed further just because of 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)