rpuch commented on code in PR #5514: URL: https://github.com/apache/ignite-3/pull/5514#discussion_r2020499543
########## modules/network/src/main/java/org/apache/ignite/internal/network/recovery/RecoveryClientHandshakeManager.java: ########## @@ -376,24 +402,17 @@ private void sendRejectionMessageAndFailHandshake( } private void onHandshakeRejectedMessage(HandshakeRejectedMessage msg) { - boolean ignorable = stopping.getAsBoolean() || !msg.reason().critical(); - - if (ignorable) { - LOG.debug("Handshake rejected by server: {}", msg.message()); - } else { + if (!stopping.getAsBoolean() && msg.reason().logAsWarn()) { LOG.warn("Handshake rejected by server: {}", msg.message()); + } else { + LOG.debug("Handshake rejected by server: {}", msg.message()); } if (msg.reason() == HandshakeRejectionReason.CLINCH) { giveUpClinch(); } else { localHandshakeCompleteFuture.completeExceptionally(new HandshakeException(msg.message())); } - - if (!ignorable) { - failureManager.process( Review Comment: Current approach of choosing the node to fail is problematic. Imagine that there is a network partition and half nodes think that other half nodes left, and vice versa. It would make sense to predictably choose one segment and restart it, but the current approach can lead to arbitrary choices (in each pair of nodes that think about each other that the other node is stale, it is defined by 'who contacts whom next' after partition is removed to make the choice), so the whole cluster (or almost the whole cluster) could be restarted as a result. The problem is described in https://issues.apache.org/jira/browse/IGNITE-22377, we should solve it there. Until then, it seems that a safer way is to just not fail any nodes and leave the user decide what to do (users will see in the logs that some nodes are stale). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org