[ https://issues.apache.org/jira/browse/IGNITE-25394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Chudov reassigned IGNITE-25394: ------------------------------------- Assignee: Denis Chudov > Log flooding on remaining nodes when a node is stopped > ------------------------------------------------------ > > Key: IGNITE-25394 > URL: https://issues.apache.org/jira/browse/IGNITE-25394 > Project: Ignite > Issue Type: Bug > Reporter: Denis Chudov > Assignee: Denis Chudov > Priority: Major > Labels: ignite-3 > > To reproduce: > # Start a 3 node cluster (in Docker) and initialize > # Stop a node > ## Leader (node1) > ## Follower (node2 or node3) > For 2A: > Followers logs (node3 and node2) will be flooded with > > {code:java} > 2025-02-24 22:40:32 2025-02-25 03:40:32:451 +0000 > [ERROR][%node2%JRaft-StepDownTimer-14][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower. > 2025-02-24 22:40:33 2025-02-25 03:40:33:052 +0000 > [ERROR][%node2%JRaft-StepDownTimer-10][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower. > 2025-02-24 22:40:33 2025-02-25 03:40:33:652 +0000 > [ERROR][%node2%JRaft-StepDownTimer-9][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower. > 2025-02-24 22:40:34 2025-02-25 03:40:34:253 +0000 > [ERROR][%node2%JRaft-StepDownTimer-2][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower. > 2025-02-24 22:40:34 2025-02-25 03:40:34:854 +0000 > [ERROR][%node2%JRaft-StepDownTimer-16][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower. > 2025-02-24 22:40:35 2025-02-25 03:40:35:454 +0000 > [ERROR][%node2%JRaft-StepDownTimer-4][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower. > 2025-02-24 22:40:36 2025-02-25 03:40:36:055 +0000 > [ERROR][%node2%JRaft-StepDownTimer-17][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower. > 2025-02-24 22:40:36 2025-02-25 03:40:36:656 +0000 > [ERROR][%node2%JRaft-StepDownTimer-6][ReplicatorGroupImpl] Fail to check > replicator connection to peer=node1, replicatorType=Follower.{code} > For 2B (any follower node is stopped) > Leader logs (node1) will be flooded with > {code:java} > 2025-02-24 22:35:50 2025-02-25 03:35:50:856 +0000 > [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue > RPC to node2, consecutiveErrorTimes=1350, error=Status[EINTERNAL<1004>: Check > connection[node2] fail and try to create new one] > 2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 > [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue > RPC to node2, consecutiveErrorTimes=1360, error=Status[EINTERNAL<1004>: Check > connection[node2] fail and try to create new one] > 2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 > [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue > RPC to node2, consecutiveErrorTimes=1570, error=Status[EINTERNAL<1004>: Check > connection[node2] fail and try to create new one] > 2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 > [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue > RPC to node2, consecutiveErrorTimes=1370, error=Status[EINTERNAL<1004>: Check > connection[node2] fail and try to create new one] > 2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 > [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue > RPC to node2, consecutiveErrorTimes=1580, error=Status[EINTERNAL<1004>: Check > connection[node2] fail and try to create new one] > 2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 > [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue > RPC to node2, consecutiveErrorTimes=1590, error=Status[EINTERNAL<1004>: Check > connection[node2] fail and try to create new one] > 2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 > [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue > RPC to node2, consecutiveErrorTimes=1380, error=Status[EINTERNAL<1004>: Check > connection[node2] fail and try to create new one] {code} > These errors needs to be throttled somehow as it pollutes the logs and will > make it more challenging to gather and analyze logs during node stoppage > incidents. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)