Re: Active nodes aliveness WatchDog

Stephen Darlington Wed, 08 Apr 2020 02:19:56 -0700

Yes. Nodes are always chatting to each another even if there are no requests 
coming In.


Here’s the status message: 
https://github.com/apache/ignite/blob/e9b3c4cebaecbeec9fa51bd6ec32a879fb89948a/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/messages/TcpDiscoveryStatusCheckMessage.java

Regards,
Stephen

> On 8 Apr 2020, at 10:04, Anton Vinogradov <[email protected]> wrote:
> 
> It seems you're talking about Failure Detection (Timeouts).
> Will it detect node failure on still cluster?
> 
> On Wed, Apr 8, 2020 at 11:52 AM Stephen Darlington <
> [email protected]> wrote:
> 
>> The configuration parameters that I’m aware of are here:
>> 
>> 
>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html
>> 
>> Other people would be better placed to discuss the internals.
>> 
>> Regards.
>> Stephen
>> 
>>> On 8 Apr 2020, at 09:32, Anton Vinogradov <[email protected]> wrote:
>>> 
>>> Stephen,
>>> 
>>>> Nodes check on their neighbours and notify the remaining nodes if one
>>> disappears.
>>> Could you explain how this works in detail?
>>> How can I set/change check frequency?
>>> 
>>> On Wed, Apr 8, 2020 at 11:13 AM Stephen Darlington <
>>> [email protected]> wrote:
>>> 
>>>> This is one of the functions of the DiscoverySPI. Nodes check on their
>>>> neighbours and notify the remaining nodes if one disappears. When the
>>>> topology changes, it triggers a rebalance, which relocates primary
>>>> partitions to live nodes. This is entirely transparent to clients.
>>>> 
>>>> It gets more complex… like there’s the partition loss policy and
>>>> rebalancing doesn’t always happen (configurable, persistence, etc)… but
>>>> broadly it does as you expect.
>>>> 
>>>> Regards,
>>>> Stephen
>>>> 
>>>>> On 8 Apr 2020, at 08:40, Anton Vinogradov <[email protected]> wrote:
>>>>> 
>>>>> Igniters,
>>>>> Do we have some feature allows to check nodes aliveness on a regular
>>>> basis?
>>>>> 
>>>>> Scenario:
>>>>> Precondition
>>>>> The cluster has no load but some node's JVM crashed.
>>>>> 
>>>>> Expected actual
>>>>> The user performs an operation (eg. cache put) related to this node
>> (via
>>>>> another node) and waits for some timeout to gain it's dead.
>>>>> The cluster starts the switch to relocate primary partitions to alive
>>>>> nodes.
>>>>> Now user able to retry the operation.
>>>>> 
>>>>> Desired
>>>>> Some WatchDog checks nodes aliveness on a regular basis.
>>>>> Once a failure detected, the cluster starts the switch.
>>>>> Later, the user performs an operation on an already fixed cluster and
>>>>> waits for nothing.
>>>>> 
>>>>> It would be good news if the "Desired" case is already Actual.
>>>>> Can somebody point to the feature that performs this check?
>>>> 
>>>> 
>>>> 
>> 
>> 
>>

Re: Active nodes aliveness WatchDog

Reply via email to