[jira] [Assigned] (IGNITE-25394) Log flooding on remaining nodes when a node is stopped

Denis Chudov (Jira) Fri, 16 May 2025 00:49:07 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-25394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denis Chudov reassigned IGNITE-25394:
-------------------------------------

    Assignee: Denis Chudov

> Log flooding on remaining nodes when a node is stopped
> ------------------------------------------------------
>
>                 Key: IGNITE-25394
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25394
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Assignee: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> To reproduce:
>  # Start a 3 node cluster (in Docker) and initialize
>  # Stop a node
>  ## Leader (node1)
>  ## Follower (node2 or node3)
> For 2A:
> Followers logs (node3 and node2) will be flooded with
>  
> {code:java}
> 2025-02-24 22:40:32 2025-02-25 03:40:32:451 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-14][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.
> 2025-02-24 22:40:33 2025-02-25 03:40:33:052 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-10][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.
> 2025-02-24 22:40:33 2025-02-25 03:40:33:652 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-9][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.
> 2025-02-24 22:40:34 2025-02-25 03:40:34:253 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-2][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.
> 2025-02-24 22:40:34 2025-02-25 03:40:34:854 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-16][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.
> 2025-02-24 22:40:35 2025-02-25 03:40:35:454 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-4][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.
> 2025-02-24 22:40:36 2025-02-25 03:40:36:055 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-17][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.
> 2025-02-24 22:40:36 2025-02-25 03:40:36:656 +0000 
> [ERROR][%node2%JRaft-StepDownTimer-6][ReplicatorGroupImpl] Fail to check 
> replicator connection to peer=node1, replicatorType=Follower.{code}
> For 2B (any follower node is stopped)
> Leader logs (node1) will be flooded with
> {code:java}
> 2025-02-24 22:35:50 2025-02-25 03:35:50:856 +0000 
> [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue 
> RPC to node2, consecutiveErrorTimes=1350, error=Status[EINTERNAL<1004>: Check 
> connection[node2] fail and try to create new one]
> 2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 
> [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue 
> RPC to node2, consecutiveErrorTimes=1360, error=Status[EINTERNAL<1004>: Check 
> connection[node2] fail and try to create new one]
> 2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 
> [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue 
> RPC to node2, consecutiveErrorTimes=1570, error=Status[EINTERNAL<1004>: Check 
> connection[node2] fail and try to create new one]
> 2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 
> [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue 
> RPC to node2, consecutiveErrorTimes=1370, error=Status[EINTERNAL<1004>: Check 
> connection[node2] fail and try to create new one]
> 2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 
> [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue 
> RPC to node2, consecutiveErrorTimes=1580, error=Status[EINTERNAL<1004>: Check 
> connection[node2] fail and try to create new one]
> 2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 
> [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue 
> RPC to node2, consecutiveErrorTimes=1590, error=Status[EINTERNAL<1004>: Check 
> connection[node2] fail and try to create new one]
> 2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 
> [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue 
> RPC to node2, consecutiveErrorTimes=1380, error=Status[EINTERNAL<1004>: Check 
> connection[node2] fail and try to create new one] {code}
> These errors needs to be throttled somehow as it pollutes the logs and will 
> make it more challenging to gather and analyze logs during node stoppage 
> incidents.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-25394) Log flooding on remaining nodes when a node is stopped

Reply via email to