[ 
https://issues.apache.org/jira/browse/IGNITE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996565#comment-14996565
 ] 

Semen Boikov commented on IGNITE-1758:
--------------------------------------

One more issue found: 
- node1 starts
- node2 joins, but get IO error trying to send NodeAdded message back to node1 
(node1 still alive)
- node2 adds node1 to failed list and tries to rejoin, it is able to send join 
request to node1, but node1 does not reply since node2 was already added to 
ring, and node2 will hang inside joinTopology

> Clients don't survive during massive servers shutdown
> -----------------------------------------------------
>
>                 Key: IGNITE-1758
>                 URL: https://issues.apache.org/jira/browse/IGNITE-1758
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: ignite-1.4
>            Reporter: Denis Magda
>            Assignee: Semen Boikov
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: ignite-1758-test.patch
>
>
> There is a real world use case.
> Start sensible amount of servers and clients.
> Perform cache operations under a transaction.
> Stop a half of the servers. Clients must survive and keep execution their 
> transactions.
> Did the following test:
> - Started 14 servers and 14 clients;
> - Clients execute transactional put operations;
> - Stopped 7 servers.
> Getting different assertions on clients side.
> {noformat}
> [15:47:33,401][ERROR][tcp-client-disco-msg-worker-#521%internal.IgniteClientReconnectCacheMultiThreadedTest18][TcpDiscoverySpi]
>  Runtime error caught during grid runnable execution: IgniteSpiThread 
> [name=tcp-client-disco-msg-worker-#521%internal.IgniteClientReconnectCacheMultiThreadedTest18]
> java.lang.AssertionError: lastVer=29, newVer=32, locNode=TcpDiscoveryNode 
> [id=80f14def-9d49-43a0-96bc-6b83aedb3008, addrs=[127.0.0.1], 
> sockAddrs=[/127.0.0.1:0], discPort=0, order=26, intOrder=0, 
> lastExchangeTime=1445428036418, loc=true, ver=1.4.1#19700101-sha1:00000000, 
> isClient=true], msg=TcpDiscoveryNodeFailedMessage 
> [failedNodeId=3020dc65-ed3e-426f-8784-5bb766961003, order=4, warning=null, 
> super=TcpDiscoveryAbstractMessage 
> [sndNodeId=10c5cfe9-df07-4dfe-a5c0-460087aa9001, 
> id=eed3e3a8051-008a978d-28cc-4f0c-8728-4a815f858000, 
> verifierNodeId=800cf998-828e-4f56-af6a-c2760c5ed008, topVer=32, pendingIdx=0, 
> isClient=false]]
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:720)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2700(ClientImpl.java:118)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeFailedMessage(ClientImpl.java:1812)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1543)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1467)
>       at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {noformat}
> {noformat}
> java.lang.AssertionError: Missed message future [rcvCnt=141, acked=0, 
> desc=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=0, 
> reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode 
> [id=6090f64b-e019-440b-9d0e-c3642bd3a006, addrs=[127.0.0.1], 
> sockAddrs=[/127.0.0.1:47503], discPort=47503, order=3, intOrder=3, 
> lastExchangeTime=1445428027468, loc=false, ver=1.4.1#19700101-sha1:00000000, 
> isClient=false], connected=false, connectCnt=1, queueLimit=5120]]
>       at 
> org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor.ackReceived(GridNioRecoveryDescriptor.java:181)
>       at 
> org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor.onHandshake(GridNioRecoveryDescriptor.java:251)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2331)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2084)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1978)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1914)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1880)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1066)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1214)
>       at 
> org.apache.ignite.internal.processors.clock.GridClockSyncProcessor.publish(GridClockSyncProcessor.java:305)
>       at 
> org.apache.ignite.internal.processors.clock.GridClockSyncProcessor.access$800(GridClockSyncProcessor.java:54)
>       at 
> org.apache.ignite.internal.processors.clock.GridClockSyncProcessor$TimeCoordinator.body(GridClockSyncProcessor.java:382)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to