[ https://issues.apache.org/jira/browse/IGNITE-19419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrey Khitrin updated IGNITE-19419: ------------------------------------ Attachment: logs.zip > Corrupted LogStorage under intensive SQL load > --------------------------------------------- > > Key: IGNITE-19419 > URL: https://issues.apache.org/jira/browse/IGNITE-19419 > Project: Ignite > Issue Type: Bug > Components: sql > Affects Versions: 3.0.0-beta1 > Reporter: Andrey Khitrin > Priority: Major > Labels: ignite-3, storage-engine > Attachments: ai3.jstack, logs.zip > > > I tried to run a SQL benchmark (slightly modified [Benchbase TPC-C > scenario|https://github.com/cmu-db/benchbase/wiki/TPC-C]) against Ignite3 > (commit hash b48ddcba7cd2bd3b9a053ae131c25b44a0400e27). > Ignite3 was running in a single-node mode with -Xmx8G -Xms8G memory limit > (increased memory size comparing to IGNITE-19412). After some time it has > become unavailable to the client. Log is filled with error messages like the > following: > {code:java} > 2023-05-03 17:51:57:705 +0300 > [WARNING][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][NodeImpl] Node > <e92d1429-965a-4cbd-af37-126844f352ca_part_18/node0> got error: Error > [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, disk > queue overload.]]. > 2023-05-03 17:51:57:705 +0300 > [WARNING][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable > error during the request type=ActionRequestImpl occurred (will be retried on > the randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > at > java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > at > java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.util.concurrent.TimeoutException > ... 7 more > 2023-05-03 17:51:57:705 +0300 > [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node > <e92d1429-965a-4cbd-af37-126844f352ca_part_5/node0> append [2268, 2268] > failed, status=Status[EIO<1014>: Corrupted LogStorage]. > 2023-05-03 17:51:57:705 +0300 > [WARNING][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][FSMCallerImpl] > FSMCaller already in error status, ignore new error > Error [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, > disk queue overload.]] > at > org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:587) > at > org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.appendEntries(LogManagerImpl.java:337) > at > org.apache.ignite.raft.jraft.core.NodeImpl.executeApplyingTasks(NodeImpl.java:1621) > at > org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:305) > at > org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:288) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:217) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:181) > at > com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) > at java.base/java.lang.Thread.run(Thread.java:829) > 2023-05-03 17:51:57:706 +0300 > [INFO][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][ReplicatorGroupImpl] > Fail to find the next candidate. > 2023-05-03 17:51:57:706 +0300 > [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node > <e92d1429-965a-4cbd-af37-126844f352ca_part_18/node0> append [2271, 2271] > failed, status=Status[EIO<1014>: Corrupted LogStorage]. > 2023-05-03 17:51:57:706 +0300 > [INFO][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][StateMachineAdapter] > onLeaderStop: status=Status[EBADNODE<10009>: Raft node(leader or candidate) > is in error.]. > 2023-05-03 17:51:57:706 +0300 > [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node > <30181391-2f04-40c0-8c35-a77592b0731f_part_5/node0> append [2285, 2285] > failed, status=Status[EIO<1014>: Corrupted LogStorage]. > {code} > More errors could be observed in the attached log archive. -- This message was sent by Atlassian Jira (v8.20.10#820010)