Yakov Zhdanov created IGNITE-6071:
-------------------------------------

             Summary: Client may detect necessity for reconnect for too long
                 Key: IGNITE-6071
                 URL: https://issues.apache.org/jira/browse/IGNITE-6071
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.1
            Reporter: Yakov Zhdanov


There was a GC pause on client that caused servers to drop client due to 
inability to establish TCP communication connection. Then it took some time for 
client to detect that it has been dropped. During that time client many times 
attempted to connect to server which can be seen in the logs. After client 
detected its drop and reconnected servers fired node added event and no log 
flood can be found any more.

We need to find out why client was reconnecting via communication and did not 
detect the drop for such a long time.

I hope this can be reproduced in test:
* start 2 servers
* start client
* suspend all client threads with Thread.suspend() - just filter threads of 
current JVM by name and suspend ones belonging to the client.

{noformat}
[10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager] 
Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f, 
addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, XXX.com/10.44.3.14:0], 
discPort=0, order=2, intOrder=2, lastExchangeTime=1502269008673, loc=false, 
ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
[10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] 
Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB]
[10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init 
[topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12, 
node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, 
addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, 
XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, 
lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, 
isClient=false], evtNode=TcpDiscoveryNode 
[id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], 
sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, 
order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, 
ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
[10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false]
[10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager]
 Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=5, minorTopVer=0], evt=NODE_FAILED, 
node=dd71479c-41ba-443e-b25c-3803a2a94f4f]
[10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52474]
[10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52482]
[10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52506]
[10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52522]
[10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52538]
[10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52554]
[10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52570]
[10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52586]
[10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52602]
[10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52618]
[10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52634]
[10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52650]
[10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:52666]

...

[10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:43604]
[10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:43620]
[10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/10.44.3.11:47100, 
rmtAddr=/10.44.3.14:43636]
[10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Added 
new node to topology: TcpDiscoveryNode 
[id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1], 
sockAddrs=[/127.0.0.1:0, gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0], 
discPort=0, order=6, intOrder=5, lastExchangeTime=1502270322805, loc=false, 
ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
[10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] 
Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB]
[10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init 
[topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10, 
node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, 
addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, 
XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, 
lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, 
isClient=false], evtNode=TcpDiscoveryNode 
[id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], 
sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, 
order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, 
ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
[10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false]
[10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager]
 Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=6, minorTopVer=0], evt=NODE_JOINED, 
node=2e80b0f0-21db-451d-a264-34ba16e00ffa]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to