Yes. Nodes are always chatting to each another even if there are no requests coming In.
Here’s the status message: https://github.com/apache/ignite/blob/e9b3c4cebaecbeec9fa51bd6ec32a879fb89948a/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/messages/TcpDiscoveryStatusCheckMessage.java Regards, Stephen > On 8 Apr 2020, at 10:04, Anton Vinogradov <a...@apache.org> wrote: > > It seems you're talking about Failure Detection (Timeouts). > Will it detect node failure on still cluster? > > On Wed, Apr 8, 2020 at 11:52 AM Stephen Darlington < > stephen.darling...@gridgain.com> wrote: > >> The configuration parameters that I’m aware of are here: >> >> >> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html >> >> Other people would be better placed to discuss the internals. >> >> Regards. >> Stephen >> >>> On 8 Apr 2020, at 09:32, Anton Vinogradov <a...@apache.org> wrote: >>> >>> Stephen, >>> >>>> Nodes check on their neighbours and notify the remaining nodes if one >>> disappears. >>> Could you explain how this works in detail? >>> How can I set/change check frequency? >>> >>> On Wed, Apr 8, 2020 at 11:13 AM Stephen Darlington < >>> stephen.darling...@gridgain.com> wrote: >>> >>>> This is one of the functions of the DiscoverySPI. Nodes check on their >>>> neighbours and notify the remaining nodes if one disappears. When the >>>> topology changes, it triggers a rebalance, which relocates primary >>>> partitions to live nodes. This is entirely transparent to clients. >>>> >>>> It gets more complex… like there’s the partition loss policy and >>>> rebalancing doesn’t always happen (configurable, persistence, etc)… but >>>> broadly it does as you expect. >>>> >>>> Regards, >>>> Stephen >>>> >>>>> On 8 Apr 2020, at 08:40, Anton Vinogradov <a...@apache.org> wrote: >>>>> >>>>> Igniters, >>>>> Do we have some feature allows to check nodes aliveness on a regular >>>> basis? >>>>> >>>>> Scenario: >>>>> Precondition >>>>> The cluster has no load but some node's JVM crashed. >>>>> >>>>> Expected actual >>>>> The user performs an operation (eg. cache put) related to this node >> (via >>>>> another node) and waits for some timeout to gain it's dead. >>>>> The cluster starts the switch to relocate primary partitions to alive >>>>> nodes. >>>>> Now user able to retry the operation. >>>>> >>>>> Desired >>>>> Some WatchDog checks nodes aliveness on a regular basis. >>>>> Once a failure detected, the cluster starts the switch. >>>>> Later, the user performs an operation on an already fixed cluster and >>>>> waits for nothing. >>>>> >>>>> It would be good news if the "Desired" case is already Actual. >>>>> Can somebody point to the feature that performs this check? >>>> >>>> >>>> >> >> >>