[ 
https://issues.apache.org/jira/browse/IGNITE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Steshin updated IGNITE-14053:
--------------------------------------
    Description: 
Suggestion: remove duplicated ‘ping’, make the code simpler.

To ensure some node isn't failed TcpDiscoverySpi has sustained ping, 
TcpDiscoveryConnectionCheckMessage and the backward connection check. But there 
is also status check message (TcpDiscoveryStatusCheckMessage) which looks 
outdated. This message was introduced with first versions of the discovery when 
the cluster stability and message delivery were under developing.

  was:
Suggestion: remove duplicated ‘ping’, make the code simpler.

To ensure some node isn't failed TcpDiscoverySpi has sustained ping, 
TcpDiscoveryConnectionCheckMessage and the backward connection check. But there 
is also status check message (TcpDiscoveryStatusCheckMessage) which looks 
outdated. This message was introduced with first versions of the discovery when 
the cluster stability and message delivery were under developing.

Currently, TcpDiscoveryStatusCheckMessage is actually launched only at cluster 
start sometimes. And doesn't happen later due to the ping. The ping updates 
time of the last message received which is the reason not to raise the status 
check.

It is possible that node loses all incoming connection but keeps connection to 
next node. In this case the node gets removed from the ring by its follower. 
But cannot recognize the failure because it still successfully send message to 
next node. Instead of complex processing of TcpDiscoveryStatusCheckMessage, it 
iseems enough to answer on message 'OK, but you are not in the ring'. Every 
other nodes see failure of malfunction node and can notify about it in the 
message response.

The ticket has been additionally verified with the integration discovery test: 
[https://github.com/apache/ignite/pull/8716]

The parent ticket (IGNITE-13980) suggests keeping 
TcpDiscoveryStatusCheckMessage for backward compatibility with older versions 
of Ignite.


> Remove status check message at all.
> -----------------------------------
>
>                 Key: IGNITE-14053
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14053
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Suggestion: remove duplicated ‘ping’, make the code simpler.
> To ensure some node isn't failed TcpDiscoverySpi has sustained ping, 
> TcpDiscoveryConnectionCheckMessage and the backward connection check. But 
> there is also status check message (TcpDiscoveryStatusCheckMessage) which 
> looks outdated. This message was introduced with first versions of the 
> discovery when the cluster stability and message delivery were under 
> developing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to