[ 
https://issues.apache.org/jira/browse/IGNITE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-19115:
-----------------------------------------
    Description: 
Let's consider the following scenario:
  Precondition:
    there is a cluster of two server nodes (node A - coordinator, and node B) 
and an atomic cache that resides on that nodes.
    current topology version is (x, y)

Node B initiates putting a new key-value pair into the atomic cache. Let's 
assume the primary partition, which belongs to the key, resides on node A.

The previous step requires acquiring a gateway lock for the corresponding cache 
(GridCacheGateway read lock) and registering GridNearAtomicSingleUpdateFuture 
into the MVCC manager. It is important to note, that cache future does not 
acquire topology lock and so should not block PME

Concurrently, node A initiates destroying the cache. Corresponding PME will be 
successfully completed on the coordinator node and blocked on node B just 
because the gateway is already acquired

{noformat}
"sys-#274%SAGE server%" #338 prio=5 os_prio=0 tid=0x00007faf4b2a1000 
nid=0x114e5 sleeping[0x00007fa8d9ac5000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:8448)
        at 
org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:323)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2617)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$a1367cb0$1(GridCacheProcessor.java:2818)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor$$Lambda$1880/434639048.apply(Unknown
 Source)
        at 
org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11913)
        at 
org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11815)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2793)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2942)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2530)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4829)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:160)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4498)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4486)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:470)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:363)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4486)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:2029)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:474)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:461)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3819)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3798)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1150)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1334)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4800(GridIoManager.java:158)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1218)
        at 
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
{noformat}

Node A initiates creating a new cache with the same name as previously 
destroyed.

Node A received a cache update message but it cannot be processed, because a 
new cache (cache with the same cacheId) is starting, so, the processing of this 
message should be postponed until PME is completed (In this case the 
GridDhtForceKeysFuture is created, and the message will not be processed until 
PME is completed. So, the near node will not receive a response and it will not 
be able to complete the previous exchange future. see IGNITE-10251).

new PME on node B cannot proceed further just because of 3.

  was:
Let's consider the following scenario:
  Precondition:
    there is a cluster of two server nodes (node A - coordinator, and node B) 
and an atomic cache that resides on that nodes.
    current topology version is (x, y)

Node B initiates putting a new key-value pair into the atomic cache. Let's 
assume the primary partition, which belongs to the key, resides on node A.

The previous step requires acquiring a gateway lock for the corresponding cache 
(GridCacheGateway read lock) and registering GridNearAtomicSingleUpdateFuture 
into the MVCC managerIt is important to note, that cache future does not 
acquire topology lock and so should not block PME

Concurrently, node A initiates destroying the cache. Corresponding PME will be 
successfully completed on the coordinator node and blocked on node B just 
because the gateway is already acquired


{code:java}
"sys-#274%SAGE server%" #338 prio=5 os_prio=0 tid=0x00007faf4b2a1000 
nid=0x114e5 sleeping[0x00007fa8d9ac5000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:8448)
        at 
org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:323)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2617)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$a1367cb0$1(GridCacheProcessor.java:2818)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor$$Lambda$1880/434639048.apply(Unknown
 Source)
        at 
org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11913)
        at 
org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11815)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2793)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2942)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2530)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4829)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:160)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4498)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4486)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:470)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:363)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4486)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:2029)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:474)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:461)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3819)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3798)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1150)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1334)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4800(GridIoManager.java:158)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1218)
        at 
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
{code}

Node A initiates creating a new cache with the same name as previously 
destroyed.

Node A received a cache update message but it cannot be processed, because a 
new cache (cache with the same cacheId) is starting, so, the processing of this 
message should be postponed until PME is completed (In this case the 
GridDhtForceKeysFuture is created, and the message will not be processed until 
PME is completed. So, the near node will not receive a response and it will not 
be able to complete the previous exchange future. see IGNITE-10251).

new PME on node B cannot proceed further just because of 3.


> Possible deadlock in handling pending cache messages when the cache is 
> recreated
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-19115
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19115
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.14
>            Reporter: Vyacheslav Koptilin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>
> Let's consider the following scenario:
>   Precondition:
>     there is a cluster of two server nodes (node A - coordinator, and node B) 
> and an atomic cache that resides on that nodes.
>     current topology version is (x, y)
> Node B initiates putting a new key-value pair into the atomic cache. Let's 
> assume the primary partition, which belongs to the key, resides on node A.
> The previous step requires acquiring a gateway lock for the corresponding 
> cache (GridCacheGateway read lock) and registering 
> GridNearAtomicSingleUpdateFuture into the MVCC manager. It is important to 
> note, that cache future does not acquire topology lock and so should not 
> block PME
> Concurrently, node A initiates destroying the cache. Corresponding PME will 
> be successfully completed on the coordinator node and blocked on node B just 
> because the gateway is already acquired
> {noformat}
> "sys-#274%SAGE server%" #338 prio=5 os_prio=0 tid=0x00007faf4b2a1000 
> nid=0x114e5 sleeping[0x00007fa8d9ac5000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:8448)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:323)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2617)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$a1367cb0$1(GridCacheProcessor.java:2818)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor$$Lambda$1880/434639048.apply(Unknown
>  Source)
>       at 
> org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11913)
>       at 
> org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11815)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2793)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2942)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2530)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4829)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:160)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4498)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4486)
>       at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:470)
>       at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:363)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4486)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:2029)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:474)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:461)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3819)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3798)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1150)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1334)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4800(GridIoManager.java:158)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1218)
>       at 
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
>       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>       at java.lang.Thread.run(Unknown Source)
> {noformat}
> Node A initiates creating a new cache with the same name as previously 
> destroyed.
> Node A received a cache update message but it cannot be processed, because a 
> new cache (cache with the same cacheId) is starting, so, the processing of 
> this message should be postponed until PME is completed (In this case the 
> GridDhtForceKeysFuture is created, and the message will not be processed 
> until PME is completed. So, the near node will not receive a response and it 
> will not be able to complete the previous exchange future. see IGNITE-10251).
> new PME on node B cannot proceed further just because of 3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to