Igor created IGNITE-24722: ----------------------------- Summary: [FLAKY][Windows] 1 node goes down when 3 nodes cluster is started on 9 cores cpu Key: IGNITE-24722 URL: https://issues.apache.org/jira/browse/IGNITE-24722 Project: Ignite Issue Type: Bug Components: general, platforms Affects Versions: 3.1 Environment: 3 nodes on single Windows machine (cores=9, memory=32766) Reporter: Igor Attachments: cluster logs.zip
*Steps to reproduce:* 1. Start 3 nodes on single Windows machine (cores=9, memory=32766) *Expected:* 3 nodes started and joined into cluster. *Actual:* 1 node makes thread dump and shutting down. The node has log messages like: {code:java} 2025-03-05 22:19:32:184 -0600 [WARNING][%BasicAi3Operations3NodesTest_cluster_1%common-scheduler-0][FailureManager] Possible failure suppressed according to a configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED] org.apache.ignite.lang.IgniteException: IGN-WORKERS-1 TraceId:538a0c73-bc2e-481b-a5df-45ab414c3e15 A critical thread is blocked for 2978 ms that is more than the allowed 500 ms, it is "%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0" prio=10 Id=153 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@608a31a6 at java.base@11.0.16.1/jdk.internal.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@608a31a6 at java.base@11.0.16.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at java.base@11.0.16.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081) at java.base@11.0.16.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433) at java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:834) {code} and {code:java} 2025-03-05 22:19:32:535 -0600 [INFO][%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0][DistributionZoneManager] Failed to update distribution zones' logical topology and version keys [topology = [{id=71f7ef04-da2f-45d2-a1f1-b802e0542f67, name=BasicAi3Operations3NodesTest_cluster_0, address=172.25.1.11:3344}], version = 1] 2025-03-05 22:19:32:545 -0600 [INFO][%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0][DistributionZoneManager] Failed to update distribution zones' logical topology and version keys [topology = [{id=71f7ef04-da2f-45d2-a1f1-b802e0542f67, name=BasicAi3Operations3NodesTest_cluster_0, address=172.25.1.11:3344}, {id=764f1058-8120-43e0-bdc1-e2e49ce31818, name=BasicAi3Operations3NodesTest_cluster_2, address=172.25.1.11:3346}], version = 2] {code} Logs are in attachment. -- This message was sent by Atlassian Jira (v8.20.10#820010)