Check if you have active transactions running on your Server3 at the time when Servers 1 and 2 are trying to join. Servers wait for active transactions to complete before joining the cluster and if a transaction takes longer than the join timeout (10 seconds by default) the joining node would be segmented out and never join the cluster again until restarted. You can use the control script to check active transactions: https://ignite.apache.org/docs/latest/tools/control-script#transaction-management
чт, 1 июл. 2021 г. в 18:20, Joan Pujol <joanpu...@gmail.com>: > Hi, > > I've three servers running tomcat with Ignite 2.10 embedded. If I start > all the nodes from a cold start (all servers stopped) it works well. > But if I stop one of the server nodes and restart it again it gets stuck > joining to the cluster and retrying infinitely with > WARN [main] o.a.i.s.d.t.TcpDiscoverySpi - Timed out waiting for message > delivery receipt and other timeout messages. > > Details: > Server1 (10.114.0.8): Two wars with Ignite server embedded > Server2: (10.114.0.13): A war with ignite server embedded > Server3: (10.114.0.9): A war with Ignite client > > The problematic server, is Server1, which gets stuck if it's started while > the Server3 it's also started. > If server3 it's stopped then Server1 starts correctly without getting > stuck. > > I attach server log from Server1: > https://www.dropbox.com/s/2xc4as3qqorq21q/server1.log?dl=0 > And server2: https://www.dropbox.com/s/vtxhhg690aorbvo/server2.log?dl=0 > > And stacktrace of where the Serv1 gets stuck: > wait:-1, Object (java.lang) > > joinTopology:1179, ServerImpl (org.apache.ignite.spi.discovery.tcp) > spiStart:472, ServerImpl (org.apache.ignite.spi.discovery.tcp) > spiStart:2154, TcpDiscoverySpi (org.apache.ignite.spi.discovery.tcp) > startSpi:278, GridManagerAdapter (org.apache.ignite.internal.managers) > start:981, GridDiscoveryManager > (org.apache.ignite.internal.managers.discovery) > startManager:1968, IgniteKernal (org.apache.ignite.internal) > start:1324, IgniteKernal (org.apache.ignite.internal) > start0:2112, IgnitionEx$IgniteNamedInstance (org.apache.ignite.internal) > start:1758, IgnitionEx$IgniteNamedInstance (org.apache.ignite.internal) > start0:1143, IgnitionEx (org.apache.ignite.internal) > start:663, IgnitionEx (org.apache.ignite.internal) > start:589, IgnitionEx (org.apache.ignite.internal) > start:328, Ignition (org.apache.ignite) > > > Configuration can be seen in server1 log but basically nodes are > configured with multicast but with giving ips of the two server nodes. > > Any help or guidance to find the problem will be heavily appreciated. > > Cheers, > > -- > Joan Jesús Pujol Espinar > -- Best regards, Alexey