Roman Puchkovskiy created IGNITE-20914: ------------------------------------------
Summary: Make ScaleCube's metadataTimeout configurable Key: IGNITE-20914 URL: https://issues.apache.org/jira/browse/IGNITE-20914 Project: Ignite Issue Type: Improvement Reporter: Roman Puchkovskiy Fix For: 3.0.0-beta2 ScaleCube's MembershipProtocolImpl fetches node's metadata periodically (using GetMetaDataRequest). If it does not get a response before metadataTimeout expires, it seems to think that the node is not alive anymore and generates a REMOVED event: [2023-11-17T00:20:22,153][WARN ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][updateMembership][MEMBERSHIP_GOSSIP] Skipping to add/update member: \{m: default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, s: ALIVE, inc: 9}, due to failed fetchMetadata call (cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (and no fallback has been configured)) [2023-11-17T00:20:29,189][INFO ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345] Member left without notification: default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344 [2023-11-17T00:20:29,190][INFO ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][publishEvent] MembershipEvent[type=REMOVED, member=default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, oldMetadata=1e61c6c8-154, newMetadata=null, timestamp=2023-11-17T00:20:29.189Z] We should avoid this. It seems that 1 second might be too small for a node under load. We should make this configurable via Ignite configuration. Also, it probably makes sense to set a higher default (like 10 seconds). The reason for the latter is that, if the timeout expires, a node is removed from the physical topology and cannot return there without a restart (this is what our connection establishment protocol requires), so this timeout is critical for stability of Ignite (while it is probably not critical for an average ScaleCube-based application). -- This message was sent by Atlassian Jira (v8.20.10#820010)