[ 
https://issues.apache.org/jira/browse/IGNITE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266650#comment-16266650
 ] 

Alexey Goncharuk commented on IGNITE-6113:
------------------------------------------

The issue with the code is that we wait for partition rent in the assign() 
method. This blocks exchange thread from progress until the partition is 
evicted.

In my understanding, we already have all the machinery to fix this issue:
1) Move partition to MOVING state, but mark it as required to be evicted
2) In demander, do not send first demand message until we iterated over the 
full partition and attempted to clean it. After the iteration is done, even if 
the partition is not empty, we can safely start rebalancing. Also, during the 
clear, we can skip versions that arrived after the rebalancing has started 
(need to figure out if this can be implemented)
3) After iteration is finished, start rebalancing for this particular partition.

Also, note the suspicious code in the GridDhtPreloader#assing() method:
{code}
                if (histSupplier != null) {
                    if (part.state() != MOVING) {
                        if (log.isDebugEnabled())
                            log.debug("Skipping partition assignment (state is 
not MOVING): " + part);

                        continue; // For.
                    }
{code}
Most likely, we need the same logic here, even if we have a history supplier.

> Partition eviction prevents exchange from completion
> ----------------------------------------------------
>
>                 Key: IGNITE-6113
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6113
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Vladislav Pyatkov
>
> I has waited for 3 hours for completion without any success.
> exchange-worker is blocked.
> {noformat}
> "exchange-worker-#92%DPL_GRID%grid554.ca.sbrf.ru%" #173 prio=5 os_prio=0 
> tid=0x00007f0835c2e000 nid=0xb907 runnable [0x00007e74ab1d0000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007efee630a7c0> (a 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition$1)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189)
>         at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.assign(GridDhtPreloader.java:340)
>         at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1801)
>         at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "sys-#124%DPL_GRID%grid554.ca.sbrf.ru%" #278 prio=5 os_prio=0 
> tid=0x00007e731c02d000 nid=0xbf4d runnable [0x00007e734e7f7000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>         - locked <0x00007f056161bf88> (a java.lang.Object)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.writeBuffer(FileWriteAheadLogManager.java:1829)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:1572)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:1421)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.access$800(FileWriteAheadLogManager.java:1331)
>         at 
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:339)
>         at 
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.beforeReleaseWrite(PageMemoryImpl.java:1287)
>         at 
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1142)
>         at 
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageImpl.releaseWrite(PageImpl.java:167)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writeUnlock(PageHandler.java:193)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:242)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:119)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.doRemoveFromLeaf(BPlusTree.java:2886)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.removeFromLeaf(BPlusTree.java:2865)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.access$6900(BPlusTree.java:2515)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1607)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.doRemove(BPlusTree.java:1481)
>         at 
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.remove(BPlusTree.java:1451)
>         at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.remove(H2TreeIndex.java:307)
>         at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.doUpdate(GridH2Table.java:637)
>         at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:517)
>         at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:664)
>         at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:1186)
>         at 
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:467)
>         at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1090)
>         at 
> org.gridgain.grid.cache.db.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:993)
>         at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:357)
>         at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3621)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:599)
>         - locked <0x00007f054d45bad8> (a 
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCacheEntry)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:956)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:793)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:856)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:843)
>         at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6660)
>         at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:925)
>         at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to