Pavel Pereslegin created IGNITE-28084:
-----------------------------------------

             Summary: Sql. 
ExecutionServiceImplTest.outdatedNodeLeftEventDoesntCauseQueryToHang may hang
                 Key: IGNITE-28084
                 URL: https://issues.apache.org/jira/browse/IGNITE-28084
             Project: Ignite
          Issue Type: Bug
          Components: sql ai3
            Reporter: Pavel Pereslegin
            Assignee: Pavel Pereslegin


In a very rare case the test hangs and fails with the following error:
{noformat}
java.util.concurrent.TimeoutException
        at 
java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1960)
        at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2095)
        at 
org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:853)
        at 
org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:882)
        at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest.outdatedNodeLeftEventDoesntCauseQueryToHang(ExecutionServiceImplTest.java:1112)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
{noformat}

The output also includes diagnostic information:
{noformat}
[2026-03-05T20:54:11,288][WARN ][main][ExecutionServiceImpl] SQL execution 
service could not be stopped within the specified timeout (5000 ms).Debug info 
for query: ExecutionId [queryId=e629446d-3a49-4651-865e-e73668103ba1, 
executionToken=0] (canceled=true, stopped=false)
  Coordinator node: node_1 (current node)
  Root node state: completed exceptionally 
(org.apache.ignite.internal.sql.engine.QueryCancelledException: IGN-SQL-6 The 
query was cancelled while executing. TraceId:ad076434)

  Fragments awaiting init completion:
    id=1, node=node_1
    id=2, node=node_1
    id=2, node=node_2
    id=2, node=node_3



Dumping threads:

"main" prio=5 Id=1 RUNNABLE
        at 
[email protected]/sun.management.ThreadImpl.dumpThreads0(Native Method)
        at 
[email protected]/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:518)
        at 
[email protected]/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:506)
        at 
app//org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.dumpThreads(ExecutionServiceImpl.java:898)
        at 
app//org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.stop(ExecutionServiceImpl.java:712)
        at 
app//org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest.lambda$tearDown$5(ExecutionServiceImplTest.java:334)
        at 
app//org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$$Lambda/0x00007fd8f0834488.accept(Unknown
 Source)
        at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1596)
        ...


"Reference Handler" daemon prio=10 Id=9 RUNNABLE
        at 
[email protected]/java.lang.ref.Reference.waitForReferencePendingList(Native 
Method)
        at 
[email protected]/java.lang.ref.Reference.processPendingReferences(Reference.java:246)
        at 
[email protected]/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:208)


"Finalizer" daemon prio=8 Id=10 WAITING on 
java.lang.ref.NativeReferenceQueue$Lock@19c2e5df
        at [email protected]/java.lang.Object.wait0(Native Method)
        -  waiting on java.lang.ref.NativeReferenceQueue$Lock@19c2e5df
        at [email protected]/java.lang.Object.wait(Unknown Source)
        at [email protected]/java.lang.Object.wait(Unknown Source)
        at 
[email protected]/java.lang.ref.NativeReferenceQueue.await(NativeReferenceQueue.java:48)
        at 
[email protected]/java.lang.ref.ReferenceQueue.remove0(ReferenceQueue.java:158)
        at 
[email protected]/java.lang.ref.NativeReferenceQueue.remove(NativeReferenceQueue.java:89)
        at 
[email protected]/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:173)


"Signal Dispatcher" daemon prio=9 Id=11 RUNNABLE


"Common-Cleaner" daemon prio=8 Id=26 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6c629587
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6c629587
        at 
[email protected]/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:269)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1886)
        at 
[email protected]/java.lang.ref.ReferenceQueue.await(ReferenceQueue.java:71)
        at 
[email protected]/java.lang.ref.ReferenceQueue.remove0(ReferenceQueue.java:143)
        at 
[email protected]/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:218)
        at 
[email protected]/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:140)
        at [email protected]/java.lang.Thread.runWith(Thread.java:1596)
        ...


"Monitor Ctrl-Break" daemon prio=5 Id=27 RUNNABLE (in native)
        at [email protected]/sun.nio.ch.SocketDispatcher.read0(Native Method)
        at 
[email protected]/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:47)
        at 
[email protected]/sun.nio.ch.NioSocketImpl.tryRead(NioSocketImpl.java:256)
        at 
[email protected]/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:307)
        at 
[email protected]/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346)
        at 
[email protected]/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796)
        at 
[email protected]/java.net.Socket$SocketInputStream.read(Socket.java:1099)
        at 
[email protected]/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:350)
        ...

        Number of locked synchronizers = 3
        - java.util.concurrent.locks.ReentrantLock$NonfairSync@5f91d91f
        - java.util.concurrent.locks.ReentrantLock$NonfairSync@410ff902
        - java.util.concurrent.locks.ReentrantLock$NonfairSync@4fed00e1


"Notification Thread" daemon prio=9 Id=28 RUNNABLE


"Attach Listener" daemon prio=9 Id=46 RUNNABLE


"FastTimestamps updater" daemon prio=5 Id=49 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@375b6a36
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@375b6a36
        at 
[email protected]/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:269)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1797)
        at 
[email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
        at 
[email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1070)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        ...


"ForkJoinPool.commonPool-worker-1" daemon prio=5 Id=51 WAITING on 
java.util.concurrent.ForkJoinPool@66d314e3
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.ForkJoinPool@66d314e3
        at 
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)
        at 
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)


"ForkJoinPool.commonPool-worker-2" daemon prio=5 Id=65 TIMED_WAITING on 
java.util.concurrent.ForkJoinPool@66d314e3
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.ForkJoinPool@66d314e3
        at 
[email protected]/java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:449)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1891)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)
        at 
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)


"ForkJoinPool.commonPool-worker-3" daemon prio=5 Id=66 WAITING on 
java.util.concurrent.ForkJoinPool@66d314e3
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.ForkJoinPool@66d314e3
        at 
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)
        at 
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)


"ForkJoinPool.commonPool-worker-4" daemon prio=5 Id=1027 WAITING on 
java.util.concurrent.ForkJoinPool@66d314e3
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.ForkJoinPool@66d314e3
        at 
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1893)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1809)
        at 
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)


"test-ExecutionServiceImplTest-outdatedNodeLeftEventDoesntCauseQueryToHang-commonExecutor0"
 prio=5 Id=289439 WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@582a26d2
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@582a26d2
        at 
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:519)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3780)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3725)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1746)
        at 
[email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1170)
        at 
[email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
        ...


"%node_1%sql-execution-pool-0" prio=5 Id=289442 WAITING on 
java.util.concurrent.CountDownLatch$Sync@7fe7df3e
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.CountDownLatch$Sync@7fe7df3e
        at 
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:788)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1133)
        at 
[email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
        at 
app//org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest.lambda$outdatedNodeLeftEventDoesntCauseQueryToHang$69(ExecutionServiceImplTest.java:1089)
        at 
app//org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$$Lambda/0x00007fd8f079ca28.intercept(Unknown
 Source)
        at 
app//org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.onReceive(ExecutionServiceImplTest.java:1626)
        ...

        Number of locked synchronizers = 1
        - java.util.concurrent.ThreadPoolExecutor$Worker@3e398f9c


"%node_1%sql-execution-pool-1" prio=5 Id=289444 WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1301102e
        at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
        -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1301102e
        at 
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:519)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3780)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3725)
        at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1746)
        at 
[email protected]/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1070)
        ...
{noformat}

Need to investigate and fix the issue.
The issue can be easily reproduced with adding a little delay to test method 
IginteTestUtils.runAsync(..) before returning future
{code:Java}
        }).start();
        // add sleep(100) here
        return fut;
{code}

*Note:* there is also a thread leak present, because after each test it is 
necessary to stop the running QueryTaskExecutor-s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to