Igor created IGNITE-24722:
-----------------------------

             Summary: [FLAKY][Windows] 1 node goes down when 3 nodes cluster is 
started on 9 cores cpu
                 Key: IGNITE-24722
                 URL: https://issues.apache.org/jira/browse/IGNITE-24722
             Project: Ignite
          Issue Type: Bug
          Components: general, platforms
    Affects Versions: 3.1
         Environment: 3 nodes on single Windows machine (cores=9, memory=32766)
            Reporter: Igor
         Attachments: cluster logs.zip

*Steps to reproduce:*
1. Start 3 nodes on single Windows machine (cores=9, memory=32766)

*Expected:*
3 nodes started and joined into cluster.

*Actual:*

1 node makes thread dump and shutting down.

The node has log messages like:
{code:java}
2025-03-05 22:19:32:184 -0600 
[WARNING][%BasicAi3Operations3NodesTest_cluster_1%common-scheduler-0][FailureManager]
 Possible failure suppressed according to a configured handler 
[hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED]
org.apache.ignite.lang.IgniteException: IGN-WORKERS-1 
TraceId:538a0c73-bc2e-481b-a5df-45ab414c3e15 A critical thread is blocked for 
2978 ms that is more than the allowed 500 ms, it is 
"%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0" 
prio=10 Id=153 WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@608a31a6
    at java.base@11.0.16.1/jdk.internal.misc.Unsafe.park(Native Method)
    -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@608a31a6
    at 
java.base@11.0.16.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
    at 
java.base@11.0.16.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081)
    at 
java.base@11.0.16.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)
    at 
java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
    at 
java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
    at 
java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:834) {code}
and
{code:java}
2025-03-05 22:19:32:535 -0600 
[INFO][%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0][DistributionZoneManager]
 Failed to update distribution zones' logical topology and version keys 
[topology = [{id=71f7ef04-da2f-45d2-a1f1-b802e0542f67, 
name=BasicAi3Operations3NodesTest_cluster_0, address=172.25.1.11:3344}], 
version = 1]
2025-03-05 22:19:32:545 -0600 
[INFO][%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0][DistributionZoneManager]
 Failed to update distribution zones' logical topology and version keys 
[topology = [{id=71f7ef04-da2f-45d2-a1f1-b802e0542f67, 
name=BasicAi3Operations3NodesTest_cluster_0, address=172.25.1.11:3344}, 
{id=764f1058-8120-43e0-bdc1-e2e49ce31818, 
name=BasicAi3Operations3NodesTest_cluster_2, address=172.25.1.11:3346}], 
version = 2] {code}
Logs are in attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to