Andrey Khitrin created IGNITE-24342:
---------------------------------------

             Summary: [Flaky] Cannot reliably start 3-nodes cluster on a single 
Windows machine
                 Key: IGNITE-24342
                 URL: https://issues.apache.org/jira/browse/IGNITE-24342
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 3.0
         Environment: A single Windows 10 machine with 32 Gb of RAM
            Reporter: Andrey Khitrin
         Attachments: logs.tgz

This issue doesn't have a 100% reproducibility rate, but is frequent enough to 
observe.

How to reproduce:
 # Try to start 3 AI nodes with a static `nodeFinder` on a single machine 
(configs are attached)

{code:java}
        nodeFinder {
            netClusterNodes=[
                "127.0.0.1:3344",
                "127.0.0.1:3345",
                "127.0.0.1:3346"
            ]
            type=STATIC
        }
{code}
Expected result: all nodes are up.

Actual result: 2 of 3 nodes terminated with thread dumps, cannot initialize 
cluster.

Key exceptions in logs:
 # "IllegalStateException: cannot send more responses than requests" (see 
attachment)
 # Various RAFT-related and timeout errors:

{code:java}
2025-01-28 06:03:05:471 -0600 
[ERROR][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Response-Processor-8][AbstractClientService]
 Fail to connect TablesAmountCapacityMultiNodeTest_cluster_0, exception: 
java.util.concurrent.TimeoutException.
2025-01-28 06:03:05:815 -0600 
[INFO][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Request-Processor-24][NodeImpl]
 Node <cmg_group/TablesAmountCapacityMultiNodeTest_cluster_1> ignore 
PreVoteRequest from TablesAmountCapacityMultiNodeTest_cluster_0, term=2, 
currTerm=1, because the leader TablesAmountCapacityMultiNodeTest_cluster_1's 
lease is still valid.
2025-01-28 06:03:05:815 -0600 
[ERROR][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Response-Processor-8][ReplicatorGroupImpl]
 Fail to check replicator connection to 
peer=TablesAmountCapacityMultiNodeTest_cluster_0, replicatorType=Follower.
2025-01-28 06:03:05:836 -0600 
[ERROR][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Response-Processor-8][NodeImpl]
 Fail to add a replicator, peer=TablesAmountCapacityMultiNodeTest_cluster_0.
{code}
 # Thread dumps in logs for 2 of 3 nodes (see attachment)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to