Stephen, Thanks for the hint. Vladimir, Great idea! Let me know if any help needed.
On Wed, Apr 8, 2020 at 2:19 PM Vladimir Steshin <vlads...@gmail.com> wrote: > Hi everyone. > > I think we should check behavior of failure detection with tests or find > them if already written. I’ll research this question and rise a ticket > if a reproducer appears. > > > > 08.04.2020 12:19, Stephen Darlington пишет: > > Yes. Nodes are always chatting to each another even if there are no > requests coming In. > > > > Here’s the status message: > https://github.com/apache/ignite/blob/e9b3c4cebaecbeec9fa51bd6ec32a879fb89948a/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/messages/TcpDiscoveryStatusCheckMessage.java > > > > Regards, > > Stephen > > > >> On 8 Apr 2020, at 10:04, Anton Vinogradov <a...@apache.org> wrote: > >> > >> It seems you're talking about Failure Detection (Timeouts). > >> Will it detect node failure on still cluster? > >> > >> On Wed, Apr 8, 2020 at 11:52 AM Stephen Darlington < > >> stephen.darling...@gridgain.com> wrote: > >> > >>> The configuration parameters that I’m aware of are here: > >>> > >>> > >>> > https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html > >>> > >>> Other people would be better placed to discuss the internals. > >>> > >>> Regards. > >>> Stephen > >>> > >>>> On 8 Apr 2020, at 09:32, Anton Vinogradov <a...@apache.org> wrote: > >>>> > >>>> Stephen, > >>>> > >>>>> Nodes check on their neighbours and notify the remaining nodes if one > >>>> disappears. > >>>> Could you explain how this works in detail? > >>>> How can I set/change check frequency? > >>>> > >>>> On Wed, Apr 8, 2020 at 11:13 AM Stephen Darlington < > >>>> stephen.darling...@gridgain.com> wrote: > >>>> > >>>>> This is one of the functions of the DiscoverySPI. Nodes check on > their > >>>>> neighbours and notify the remaining nodes if one disappears. When the > >>>>> topology changes, it triggers a rebalance, which relocates primary > >>>>> partitions to live nodes. This is entirely transparent to clients. > >>>>> > >>>>> It gets more complex… like there’s the partition loss policy and > >>>>> rebalancing doesn’t always happen (configurable, persistence, etc)… > but > >>>>> broadly it does as you expect. > >>>>> > >>>>> Regards, > >>>>> Stephen > >>>>> > >>>>>> On 8 Apr 2020, at 08:40, Anton Vinogradov <a...@apache.org> wrote: > >>>>>> > >>>>>> Igniters, > >>>>>> Do we have some feature allows to check nodes aliveness on a regular > >>>>> basis? > >>>>>> Scenario: > >>>>>> Precondition > >>>>>> The cluster has no load but some node's JVM crashed. > >>>>>> > >>>>>> Expected actual > >>>>>> The user performs an operation (eg. cache put) related to this node > >>> (via > >>>>>> another node) and waits for some timeout to gain it's dead. > >>>>>> The cluster starts the switch to relocate primary partitions to > alive > >>>>>> nodes. > >>>>>> Now user able to retry the operation. > >>>>>> > >>>>>> Desired > >>>>>> Some WatchDog checks nodes aliveness on a regular basis. > >>>>>> Once a failure detected, the cluster starts the switch. > >>>>>> Later, the user performs an operation on an already fixed cluster > and > >>>>>> waits for nothing. > >>>>>> > >>>>>> It would be good news if the "Desired" case is already Actual. > >>>>>> Can somebody point to the feature that performs this check? > >>>>> > >>>>> > >>> > >>> > > >