[ https://issues.apache.org/jira/browse/IGNITE-25240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy updated IGNITE-25240: --------------------------------------- Description: If a node gets stopped while installing a Raft snapshot to another node, log entries like the following ones appear: 2025-04-23 17:18:54:998 +0200 [ERROR][%defaultNode%JRaft-Common-Executor-2][SnapshotExecutorImpl] Fail to save snapshot: Status[EIO<1014>: Fail to save snapshot to /.../work/partitions/meta/370_part_21-0/snapshot, reason java.util.concurrent.CancellationException]. They are accompanied by 2025-04-23 17:18:54:999 +0200 [ERROR][%defaultNode%JRaft-FSMCaller-Disruptor_stripe_9-0][StateMachineAdapter] Encountered an error=Status[EIO<1014>: Fail to save snapshot.] on StateMachine org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine, it's highly recommended to implement this method as raft stops working since some error occurs, you should figure out the cause and repair or remove this node. Error [type=ERROR_TYPE_SNAPSHOT, status=Status[EIO<1014>: Fail to save snapshot.]] at org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl.reportError(SnapshotExecutorImpl.java:687) at org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl.onSnapshotSaveDone(SnapshotExecutorImpl.java:411) at org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl$SaveSnapshotDone.continueRun(SnapshotExecutorImpl.java:127) at org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl$SaveSnapshotDone.lambda$run$0(SnapshotExecutorImpl.java:123) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) None of these are fatal during node stop, we should avoid to log them. This might probably also happen if a partition gets evicted from a node while the node installs a Raft snapshot on a follower. This needs to be checked. > Partition raft snapshot interrupted due to node stop causes garbage in logs > --------------------------------------------------------------------------- > > Key: IGNITE-25240 > URL: https://issues.apache.org/jira/browse/IGNITE-25240 > Project: Ignite > Issue Type: Bug > Reporter: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > > If a node gets stopped while installing a Raft snapshot to another node, log > entries like the following ones appear: > 2025-04-23 17:18:54:998 +0200 > [ERROR][%defaultNode%JRaft-Common-Executor-2][SnapshotExecutorImpl] Fail to > save snapshot: Status[EIO<1014>: Fail to save snapshot to > /.../work/partitions/meta/370_part_21-0/snapshot, reason > java.util.concurrent.CancellationException]. > They are accompanied by > 2025-04-23 17:18:54:999 +0200 > [ERROR][%defaultNode%JRaft-FSMCaller-Disruptor_stripe_9-0][StateMachineAdapter] > Encountered an error=Status[EIO<1014>: Fail to save snapshot.] on > StateMachine > org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine, > it's highly recommended to implement this method as raft stops working since > some error occurs, you should figure out the cause and repair or remove this > node. > Error [type=ERROR_TYPE_SNAPSHOT, status=Status[EIO<1014>: Fail to save > snapshot.]] > at > org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl.reportError(SnapshotExecutorImpl.java:687) > at > org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl.onSnapshotSaveDone(SnapshotExecutorImpl.java:411) > at > org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl$SaveSnapshotDone.continueRun(SnapshotExecutorImpl.java:127) > at > org.apache.ignite.raft.jraft.storage.snapshot.SnapshotExecutorImpl$SaveSnapshotDone.lambda$run$0(SnapshotExecutorImpl.java:123) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) > at java.base/java.lang.Thread.run(Thread.java:1583) > > None of these are fatal during node stop, we should avoid to log them. > This might probably also happen if a partition gets evicted from a node while > the node installs a Raft snapshot on a follower. This needs to be checked. -- This message was sent by Atlassian Jira (v8.20.10#820010)