Re: Active nodes aliveness WatchDog

Anton Vinogradov Wed, 08 Apr 2020 04:32:46 -0700

Stephen,
Thanks for the hint.

Vladimir,
Great idea! Let me know if any help needed.


On Wed, Apr 8, 2020 at 2:19 PM Vladimir Steshin <[email protected]> wrote:

> Hi everyone.
>
> I think we should check behavior of failure detection with tests or find
> them if already written. I’ll research this question and rise a ticket
> if a reproducer appears.
>
>
>
> 08.04.2020 12:19, Stephen Darlington пишет:
> > Yes. Nodes are always chatting to each another even if there are no
> requests coming In.
> >
> > Here’s the status message:
> https://github.com/apache/ignite/blob/e9b3c4cebaecbeec9fa51bd6ec32a879fb89948a/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/messages/TcpDiscoveryStatusCheckMessage.java
> >
> > Regards,
> > Stephen
> >
> >> On 8 Apr 2020, at 10:04, Anton Vinogradov <[email protected]> wrote:
> >>
> >> It seems you're talking about Failure Detection (Timeouts).
> >> Will it detect node failure on still cluster?
> >>
> >> On Wed, Apr 8, 2020 at 11:52 AM Stephen Darlington <
> >> [email protected]> wrote:
> >>
> >>> The configuration parameters that I’m aware of are here:
> >>>
> >>>
> >>>
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html
> >>>
> >>> Other people would be better placed to discuss the internals.
> >>>
> >>> Regards.
> >>> Stephen
> >>>
> >>>> On 8 Apr 2020, at 09:32, Anton Vinogradov <[email protected]> wrote:
> >>>>
> >>>> Stephen,
> >>>>
> >>>>> Nodes check on their neighbours and notify the remaining nodes if one
> >>>> disappears.
> >>>> Could you explain how this works in detail?
> >>>> How can I set/change check frequency?
> >>>>
> >>>> On Wed, Apr 8, 2020 at 11:13 AM Stephen Darlington <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> This is one of the functions of the DiscoverySPI. Nodes check on
> their
> >>>>> neighbours and notify the remaining nodes if one disappears. When the
> >>>>> topology changes, it triggers a rebalance, which relocates primary
> >>>>> partitions to live nodes. This is entirely transparent to clients.
> >>>>>
> >>>>> It gets more complex… like there’s the partition loss policy and
> >>>>> rebalancing doesn’t always happen (configurable, persistence, etc)…
> but
> >>>>> broadly it does as you expect.
> >>>>>
> >>>>> Regards,
> >>>>> Stephen
> >>>>>
> >>>>>> On 8 Apr 2020, at 08:40, Anton Vinogradov <[email protected]> wrote:
> >>>>>>
> >>>>>> Igniters,
> >>>>>> Do we have some feature allows to check nodes aliveness on a regular
> >>>>> basis?
> >>>>>> Scenario:
> >>>>>> Precondition
> >>>>>> The cluster has no load but some node's JVM crashed.
> >>>>>>
> >>>>>> Expected actual
> >>>>>> The user performs an operation (eg. cache put) related to this node
> >>> (via
> >>>>>> another node) and waits for some timeout to gain it's dead.
> >>>>>> The cluster starts the switch to relocate primary partitions to
> alive
> >>>>>> nodes.
> >>>>>> Now user able to retry the operation.
> >>>>>>
> >>>>>> Desired
> >>>>>> Some WatchDog checks nodes aliveness on a regular basis.
> >>>>>> Once a failure detected, the cluster starts the switch.
> >>>>>> Later, the user performs an operation on an already fixed cluster
> and
> >>>>>> waits for nothing.
> >>>>>>
> >>>>>> It would be good news if the "Desired" case is already Actual.
> >>>>>> Can somebody point to the feature that performs this check?
> >>>>>
> >>>>>
> >>>
> >>>
> >
>

Re: Active nodes aliveness WatchDog

Reply via email to