[ https://issues.apache.org/jira/browse/IGNITE-24844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladislav Pyatkov updated IGNITE-24844: --------------------------------------- Description: h3. Motivation We shouldn't take a networked thread for a long time. Also, the process looks suspiciously long, because it can execute for minutes. {noformat} 2025-02-17 11:31:49:597 +0000 [WARNING][%poc-tester-SERVER-172.25.4.103-id-0%common-scheduler-0][FailureManager] Possible failure suppressed according to a configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED] org.apache.ignite.lang.IgniteException: IGN-WORKERS-1 TraceId:486ab39e-cb06-4300-b90c-1ffb5b3c320a A critical thread is blocked for 70560 ms that is more than the allowed 500 ms, it is "%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0" prio=10 Id=401 RUNNABLE {noformat} {noformat} Thread [name="%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0", id=401, state=RUNNABLE, blockCnt=21529, waitCnt=2229908] at java.base@21.0.4/java.util.ImmutableCollections$MapN.probe(ImmutableCollections.java:1334) at java.base@21.0.4/java.util.ImmutableCollections$MapN.<init>(ImmutableCollections.java:1194) at java.base@21.0.4/java.util.Map.ofEntries(Map.java:1680) at java.base@21.0.4/java.util.Map.copyOf(Map.java:1748) at app//org.apache.ignite.internal.table.distributed.raft.MinimumRequiredTimeCollectorServiceImpl.minTimestampPerPartition(MinimumRequiredTimeCollectorServiceImpl.java:59) at app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner.getMinLocalTime(CatalogCompactionRunner.java:266) at app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.handleMinimumTimesRequest(CatalogCompactionRunner.java:676) at app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.onReceived(CatalogCompactionRunner.java:657) at app//org.apache.ignite.internal.network.TrackableNetworkMessageHandler.onReceived(TrackableNetworkMessageHandler.java:52) at app//org.apache.ignite.internal.network.DefaultMessagingService.handleStartingWithFirstHandler(DefaultMessagingService.java:543) at app//org.apache.ignite.internal.network.DefaultMessagingService.lambda$handleMessageFromNetwork$5(DefaultMessagingService.java:438) at app//org.apache.ignite.internal.network.DefaultMessagingService$$Lambda/0x000074cd709367a8.run(Unknown Source) at java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596) at java.base@21.0.4/java.lang.Thread.run(Thread.java:1583) {noformat} h3. Definition of done * Avoid using the network thread * Reduce execution time for competition. was: h3. Motivation We shouldn't take a networked thread for a long time. Also, the process looks suspiciously long, because it can execute for minutes. {noformat} 2025-02-17 11:31:49:597 +0000 [WARNING][%poc-tester-SERVER-172.25.4.103-id-0%common-scheduler-0][FailureManager] Possible failure suppressed according to a configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED] org.apache.ignite.lang.IgniteException: IGN-WORKERS-1 TraceId:486ab39e-cb06-4300-b90c-1ffb5b3c320a A critical thread is blocked for 70560 ms that is more than the allowed 500 ms, it is "%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0" prio=10 Id=401 RUNNABLE {noformat} {noformat} Thread [name="%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0", id=401, state=RUNNABLE, blockCnt=21529, waitCnt=2229908] at java.base@21.0.4/java.util.ImmutableCollections$MapN.probe(ImmutableCollections.java:1334) at java.base@21.0.4/java.util.ImmutableCollections$MapN.<init>(ImmutableCollections.java:1194) at java.base@21.0.4/java.util.Map.ofEntries(Map.java:1680) at java.base@21.0.4/java.util.Map.copyOf(Map.java:1748) at app//org.apache.ignite.internal.table.distributed.raft.MinimumRequiredTimeCollectorServiceImpl.minTimestampPerPartition(MinimumRequiredTimeCollectorServiceImpl.java:59) at app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner.getMinLocalTime(CatalogCompactionRunner.java:266) at app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.handleMinimumTimesRequest(CatalogCompactionRunner.java:676) at app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.onReceived(CatalogCompactionRunner.java:657) at app//org.apache.ignite.internal.network.TrackableNetworkMessageHandler.onReceived(TrackableNetworkMessageHandler.java:52) at app//org.apache.ignite.internal.network.DefaultMessagingService.handleStartingWithFirstHandler(DefaultMessagingService.java:543) at app//org.apache.ignite.internal.network.DefaultMessagingService.lambda$handleMessageFromNetwork$5(DefaultMessagingService.java:438) at app//org.apache.ignite.internal.network.DefaultMessagingService$$Lambda/0x000074cd709367a8.run(Unknown Source) at java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596) at java.base@21.0.4/java.lang.Thread.run(Thread.java:1583) {noformat} > Catalog compaction hangs and holds network thread > ------------------------------------------------- > > Key: IGNITE-24844 > URL: https://issues.apache.org/jira/browse/IGNITE-24844 > Project: Ignite > Issue Type: Bug > Reporter: Vladislav Pyatkov > Priority: Major > Attachments: > poc-tester-SERVER-172.25.4.103-id-0-2025-02-17-10-41-26-client.log.zip > > > h3. Motivation > We shouldn't take a networked thread for a long time. > Also, the process looks suspiciously long, because it can execute for minutes. > {noformat} > 2025-02-17 11:31:49:597 +0000 > [WARNING][%poc-tester-SERVER-172.25.4.103-id-0%common-scheduler-0][FailureManager] > Possible failure suppressed according to a configured handler > [hnd=NoOpFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED] > org.apache.ignite.lang.IgniteException: IGN-WORKERS-1 > TraceId:486ab39e-cb06-4300-b90c-1ffb5b3c320a A critical thread is blocked for > 70560 ms that is more than the allowed 500 ms, it is > "%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0" > prio=10 Id=401 RUNNABLE > {noformat} > {noformat} > Thread > [name="%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0", > id=401, state=RUNNABLE, blockCnt=21529, waitCnt=2229908] > at > java.base@21.0.4/java.util.ImmutableCollections$MapN.probe(ImmutableCollections.java:1334) > at > java.base@21.0.4/java.util.ImmutableCollections$MapN.<init>(ImmutableCollections.java:1194) > at java.base@21.0.4/java.util.Map.ofEntries(Map.java:1680) > at java.base@21.0.4/java.util.Map.copyOf(Map.java:1748) > at > app//org.apache.ignite.internal.table.distributed.raft.MinimumRequiredTimeCollectorServiceImpl.minTimestampPerPartition(MinimumRequiredTimeCollectorServiceImpl.java:59) > at > app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner.getMinLocalTime(CatalogCompactionRunner.java:266) > at > app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.handleMinimumTimesRequest(CatalogCompactionRunner.java:676) > at > app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.onReceived(CatalogCompactionRunner.java:657) > at > app//org.apache.ignite.internal.network.TrackableNetworkMessageHandler.onReceived(TrackableNetworkMessageHandler.java:52) > at > app//org.apache.ignite.internal.network.DefaultMessagingService.handleStartingWithFirstHandler(DefaultMessagingService.java:543) > at > app//org.apache.ignite.internal.network.DefaultMessagingService.lambda$handleMessageFromNetwork$5(DefaultMessagingService.java:438) > at > app//org.apache.ignite.internal.network.DefaultMessagingService$$Lambda/0x000074cd709367a8.run(Unknown > Source) > at > java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) > at > java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) > at java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596) > at java.base@21.0.4/java.lang.Thread.run(Thread.java:1583) > {noformat} > h3. Definition of done > * Avoid using the network thread > * Reduce execution time for competition. -- This message was sent by Atlassian Jira (v8.20.10#820010)