[ https://issues.apache.org/jira/browse/IGNITE-24844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939184#comment-17939184 ]
Pavel Pereslegin edited comment on IGNITE-24844 at 3/28/25 9:07 AM: -------------------------------------------------------------------- Fixed issues related to compaction routine: * Catalog compaction should not block critical system threads. * Catalog compaction should log the exception thrown if the iteration fails (previously, the exception was only reported if tryCompactCatalog() failed * Replaced {{Map.copyOf()}} call with {{Collections.unmodifiableMap()}} My experiments showed that the thread hangs because it is running out of resources. Similar behavior is reproduced with disabled catalog compaction and this should be investigated separately. was (Author: xtern): Fixed issues related to compaction routine: * Catalog compaction should not block critical system threads. * Catalog compaction should log the exception thrown if the iteration fails (previously, the exception was only reported if tryCompactCatalog() failed * Replaced Map.copyOf call with unmodifiableMap My experiments showed that the thread hangs because it is running out of resources. Similar behavior is reproduced with disabled catalog compaction and this should be investigated separately. > Catalog compaction hangs and holds network thread > ------------------------------------------------- > > Key: IGNITE-24844 > URL: https://issues.apache.org/jira/browse/IGNITE-24844 > Project: Ignite > Issue Type: Bug > Affects Versions: 3.0 > Reporter: Vladislav Pyatkov > Assignee: Pavel Pereslegin > Priority: Major > Labels: ignite-3 > Fix For: 3.1 > > Attachments: > poc-tester-SERVER-172.25.4.103-id-0-2025-02-17-10-41-26-client.log.zip > > Time Spent: 20m > Remaining Estimate: 0h > > h3. Motivation > We shouldn't take a networked thread for a long time. > Also, the process looks suspiciously long, because it can execute for minutes. > {noformat} > 2025-02-17 11:31:49:597 +0000 > [WARNING][%poc-tester-SERVER-172.25.4.103-id-0%common-scheduler-0][FailureManager] > Possible failure suppressed according to a configured handler > [hnd=NoOpFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED] > org.apache.ignite.lang.IgniteException: IGN-WORKERS-1 > TraceId:486ab39e-cb06-4300-b90c-1ffb5b3c320a A critical thread is blocked for > 70560 ms that is more than the allowed 500 ms, it is > "%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0" > prio=10 Id=401 RUNNABLE > {noformat} > {noformat} > Thread > [name="%poc-tester-SERVER-172.25.4.103-id-0%MessagingService-inbound-Default-0-0", > id=401, state=RUNNABLE, blockCnt=21529, waitCnt=2229908] > at > java.base@21.0.4/java.util.ImmutableCollections$MapN.probe(ImmutableCollections.java:1334) > at > java.base@21.0.4/java.util.ImmutableCollections$MapN.<init>(ImmutableCollections.java:1194) > at java.base@21.0.4/java.util.Map.ofEntries(Map.java:1680) > at java.base@21.0.4/java.util.Map.copyOf(Map.java:1748) > at > app//org.apache.ignite.internal.table.distributed.raft.MinimumRequiredTimeCollectorServiceImpl.minTimestampPerPartition(MinimumRequiredTimeCollectorServiceImpl.java:59) > at > app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner.getMinLocalTime(CatalogCompactionRunner.java:266) > at > app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.handleMinimumTimesRequest(CatalogCompactionRunner.java:676) > at > app//org.apache.ignite.internal.catalog.compaction.CatalogCompactionRunner$CatalogCompactionMessageHandler.onReceived(CatalogCompactionRunner.java:657) > at > app//org.apache.ignite.internal.network.TrackableNetworkMessageHandler.onReceived(TrackableNetworkMessageHandler.java:52) > at > app//org.apache.ignite.internal.network.DefaultMessagingService.handleStartingWithFirstHandler(DefaultMessagingService.java:543) > at > app//org.apache.ignite.internal.network.DefaultMessagingService.lambda$handleMessageFromNetwork$5(DefaultMessagingService.java:438) > at > app//org.apache.ignite.internal.network.DefaultMessagingService$$Lambda/0x000074cd709367a8.run(Unknown > Source) > at > java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) > at > java.base@21.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) > at java.base@21.0.4/java.lang.Thread.runWith(Thread.java:1596) > at java.base@21.0.4/java.lang.Thread.run(Thread.java:1583) > {noformat} > h3. Definition of done > * Avoid using the network thread > * Reduce execution time for competition. -- This message was sent by Atlassian Jira (v8.20.10#820010)