[
https://issues.apache.org/jira/browse/IGNITE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Steshin updated IGNITE-13014:
--------------------------------------
Labels: iep-45 (was: )
> Remove double checking of node availability. Fix hardcoded values.
> ------------------------------------------------------------------
>
> Key: IGNITE-13014
> URL: https://issues.apache.org/jira/browse/IGNITE-13014
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Labels: iep-45
> Attachments: WostCase.txt
>
>
> For the present, we have duplicated checking of node availability. This
> prolongs node failure detection and gives no additional benefits. There are
> mesh and hardcoded values in this routine.
> Let's imagine node 2 doesn't answer any more. Node 1 becomes unable to ping
> node 2 and asks Node 3 to establish permanent connection instead of node 2.
> Despite node 2 has been already pinged within configured timeouts, node 3 try
> to connect to node 2 too.
> Disadvantages:
> 1) Possible long detection of node failure up to
> ServerImpl.CON_CHECK_INTERVAL + 2 *
> IgniteConfiguretion.failureDetectionTimeout + 300ms. See ‘WostCase.txt’
> 2) Unexpected, not-configurable decision to check availability of previous
> node based on ‘2 * ServerImpl.CON_CHECK_INTERVAL‘:
> // We got message from previous in less than double connection check interval.
> boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now;
> If ‘ok == true’ node 3 checks node 2.
> 3) Double node checking brings several not-configurable hardcoded delays:
> Node 3 checks node 2 with hardcoded timeout 100ms:
> ServerImpl.isConnectionRefused():
> sock.connect(addr, 100);
> Checking availability of previous node considers any exception but
> ConnectionException (connection refused) as existing connection. Even a
> timeout. See ServerImpl.isConnectionRefused().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)