Hello! What's the period of time?
When client disconnects, topology will change, which will trigger waiting for PME, which will delay all further operations until PME is finished. Avoid having short-lived clients. Regards, -- Ilya Kasnacheev вт, 23 апр. 2019 г. в 03:40, Matt Nohelty <[email protected]>: > I already posted this question to stack overflow here > https://stackoverflow.com/questions/55801760/what-happens-in-apache-ignite-when-a-client-gets-disconnected > but this mailing list is probably more appropriate. > > We use Apache Ignite for caching and are seeing some unexpected behavior > across all of the clients of cluster when one of the clients fails. The > Ignite cluster itself has three servers and there are approximately 12 > servers connecting to that cluster as clients. The cluster has persistence > disabled and many of the caches have near caching enabled. > > What we are seeing is that when one of the clients fail (out of memory, > high CPU, network connectivity, etc.), threads on all the other clients > block for a period of time. During these times, the Ignite servers > themselves seem fine but I see things like the following in the logs: > > Topology snapshot [ver=123, servers=3, clients=11, CPUs=XXX, offheap=XX.XGB, > heap=XXX.GB]Topology snapshot [ver=124, servers=3, clients=10, CPUs=XXX, > offheap=XX.XGB, heap=XXX.GB] > > The topology itself is clearly changing when a client connects/disconnects > but is there anything happening internally inside the cluster that could > cause blocking on other clients? I would expect re-balancing of data when a > server disconnects but not a client. > > From a thread dump, I see many threads stuck in the following state: > > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method)- parking to wait for > <0x000000078a86ff18> (a java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:7452) > at > org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.awaitAllReplies(GridReduceQueryExecutor.java:1056) > at > org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:733) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339) > at > org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$9.iterator(IgniteH2Indexing.java:1403) > at > org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95) > at java.lang.Iterable.forEach(Iterable.java:74)... > > Any ideas, suggestions, or further avenues to investigate would be much > appreciated. >
