Hello! I can see that some data processing is happening in thread dumps, but also this:
[11:16:11,637][INFO][grid-nio-worker-tcp-comm-2-#26][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/172.16.1.7:47100, rmtAddr=/10.139.0.10:38624] [11:16:12,686][SEVERE][grid-nio-worker-tcp-comm-2-#26][TcpCommunicationSpi] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2, bytesRcvd=430031923, bytesSent=2154539, bytesRcvd0=6974058, bytesSent0=1976, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null, finished=false, heartbeatTs=1581074171663, hashCode=1764437028, interrupted=false, runner=grid-nio-worker-tcp-comm-2-#26]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=384, resendCnt=0, rcvCnt=422, sentCnt=413, reserved=true, lastAck=416, nodeLeft=false, node=TcpDiscoveryNode [id=a66a573a-43dc-48d2-8ee5-232e727acbc9, addrs=[10.139.64.10, 127.0.0.1], sockAddrs=[/10.139.64.10:0, /127.0.0.1:0], discPort=0, order=19, intOrder=19, lastExchangeTime=1581073961809, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=384, resendCnt=0, rcvCnt=422, sentCnt=413, reserved=true, lastAck=416, nodeLeft=false, node=TcpDiscoveryNode [id=a66a573a-43dc-48d2-8ee5-232e727acbc9, addrs=[10.139.64.10, 127.0.0.1], sockAddrs=[/10.139.64.10:0, /127.0.0.1:0], discPort=0, order=19, intOrder=19, lastExchangeTime=1581073961809, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/172.16.1.7:47100, rmtAddr=/ 10.139.0.10:37846, createTime=1581073963095, closeTime=0, bytesSent=78611, bytesRcvd=104294928, bytesSent0=561, bytesRcvd0=916098, sndSchedTime=1581073963095, lastSndTime=1581074171592, lastRcvTime=1581074171612, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@672e22f0, directMode=true], GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1282) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2386) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2153) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1794) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [11:16:46,612][INFO][grid-nio-worker-tcp-comm-2-#26][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=c7e6fc55-d367-43d5-94e9-79ef1d984601, locNodeOrder=1, rmtNode=a66a573a-43dc-48d2-8ee5-232e727acbc9, rmtNodeOrder=19] [11:16:46,928][INFO][grid-nio-worker-tcp-comm-3-#27][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/172.16.1.7:47100, rmtAddr=/10.139.0.10:38900] [11:16:46,985][INFO][grid-nio-worker-tcp-comm-3-#27][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=c7e6fc55-d367-43d5-94e9-79ef1d984601, locNodeOrder=1, rmtNode=a66a573a-43dc-48d2-8ee5-232e727acbc9, rmtNodeOrder=19] [11:16:47,301][INFO][grid-nio-worker-tcp-comm-0-#24][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/172.16.1.7:47100, rmtAddr=/10.139.0.10:38902] [11:16:47,359][INFO][grid-nio-worker-tcp-comm-0-#24][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=c7e6fc55-d367-43d5-94e9-79ef1d984601, locNodeOrder=1, rmtNode=a66a573a-43dc-48d2-8ee5-232e727acbc9, rmtNodeOrder=19] [11:16:47,675][INFO][grid-nio-worker-tcp-comm-1-#25][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/172.16.1.7:47100, rmtAddr=/10.139.0.10:38904] [11:16:47,733][INFO][grid-nio-worker-tcp-comm-1-#25][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=c7e6fc55-d367-43d5-94e9-79ef1d984601, locNodeOrder=1, rmtNode=a66a573a-43dc-48d2-8ee5-232e727acbc9, rmtNodeOrder=19] [11:16:48,049][INFO][grid-nio-worker-tcp-comm-2-#26][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/172.16.1.7:47100, rmtAddr=/10.139.0.10:38916] [11:16:48,106][INFO][grid-nio-worker-tcp-comm-2-#26][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=c7e6fc55-d367-43d5-94e9-79ef1d984601, locNodeOrder=1, rmtNode=a66a573a-43dc-48d2-8ee5-232e727acbc9, rmtNodeOrder=19] [11:16:48,423][INFO][grid-nio-worker-tcp-comm-3-#27][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/172.16.1.7:47100, rmtAddr=/10.139.0.10:38918] [11:16:48,481][INFO][grid-nio-worker-tcp-comm-3-#27][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=c7e6fc55-d367-43d5-94e9-79ef1d984601, locNodeOrder=1, rmtNode=a66a573a-43dc-48d2-8ee5-232e727acbc9, rmtNodeOrder=19] It's a bad sign. I think you either have network problems, or maxed out your communication. I recommend the following configuration change to TcpCommunicationSpi: socketWriteTimeout 5000 usePairedConnections true connectionsPerNode 4. You may also like to assign localAddr to known good (reachable) IP address of the node, on each node. Regards, -- Ilya Kasnacheev пт, 7 февр. 2020 г. в 14:34, pg31 <singhhoneyyo...@gmail.com>: > Thanks Ilya. > > I have changed the Client Side Machine to prefer IPv4 Stack and hence that > error went away. But still the data-streamer-stripes and tcp-comm-worker > threads keep getting stuck. > > I am attaching the logs again. (These contain the thread-dump themselves) > log.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t2770/log.zip> > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >