[ https://issues.apache.org/jira/browse/HDDS-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937226#comment-17937226 ]
Hemant Kumar commented on HDDS-11033: ------------------------------------- [~jyosin] can you please double-check it and close it if not reproducible? > [snapshot-LR] OM down due to NullPointerException during SnapshotPurgeRequest > ----------------------------------------------------------------------------- > > Key: HDDS-11033 > URL: https://issues.apache.org/jira/browse/HDDS-11033 > Project: Apache Ozone > Issue Type: Bug > Components: Ozone Manager, Snapshot > Reporter: Jyotirmoy Sinha > Priority: Major > Labels: ozone-snapshot > > Steps : > * Create multiple snapshots across various volume/buckets as background load > * Perform random snapshot operations such as create, delete, list, diff > across 4 test buckets of combination - > * > ** EC - FSO and OBS > ** Ratis - FSO and OBS > * In parallel also perform repetitive reconstructions and re-replications on > the above buckets > Error snippet in OM1 - > {code:java} > 2024-06-19 16:30:39,580 INFO > [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: > Created checkpoint : > /var/lib/hadoop-ozone/om/data793412/db.snapshots/checkpointState/om.db-a4f6fa69-cf80-4fea-a3d9-6684faceff89 > for snapshot snap851 > 2024-06-19 16:30:40,764 ERROR [OM StateMachine ApplyTransaction Thread - > 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating > with exit status 1: Request cmdType: SnapshotPurge > clientId: "client-DE4DAB584843" > SnapshotPurgeRequest { > updatedSnapshotDBKey: "/testvol/buckecfso/snap1718809217" > } > failed with exception > java.lang.NullPointerException > at > org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest.validateAndUpdateCache(OMSnapshotPurgeRequest.java:107) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:560) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:353) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > Error snippet in OM2 - > {code:java} > 2024-06-19 16:30:39,611 INFO > [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: > Created checkpoint : > /var/lib/hadoop-ozone/om/data793412/db.snapshots/checkpointState/om.db-a4f6fa69-cf80-4fea-a3d9-6684faceff89 > for snapshot snap851 > 2024-06-19 16:30:40,770 ERROR [OM StateMachine ApplyTransaction Thread - > 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating > with exit status 1: Request cmdType: SnapshotPurge > clientId: "client-DE4DAB584843" > SnapshotPurgeRequest { > updatedSnapshotDBKey: "/testvol/buckecfso/snap1718809217" > } > failed with exception > java.lang.NullPointerException > at > org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest.validateAndUpdateCache(OMSnapshotPurgeRequest.java:107) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:560) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:353) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2024-06-19 16:30:40,777 INFO > [shutdown-hook-0]-org.apache.ranger.audit.provider.AuditProviderFactory: ==> > JVMShutdownHook.run() {code} > Error snippet in OM3 - > {code:java} > 2024-06-19 16:30:44,852 ERROR > [KeyDeletingService#0]-org.apache.hadoop.ozone.om.service.KeyDeletingService: > Snapshot deep cleaning request failed. Will retry at next run. > com.google.protobuf.ServiceException: > org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException: OM:om135 is not > the leader. Could not determine the leader node. > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:462) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:289) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.submitRequest(KeyDeletingService.java:392) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.updateDeepCleanedSnapshots(KeyDeletingService.java:373) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.processSnapshotDeepClean(KeyDeletingService.java:357) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:204) > at > org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121) > at > java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException: > OM:om135 is not the leader. Could not determine the leader node. > at > org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException.convertToOMNotLeaderException(OMNotLeaderException.java:86) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:463) > ... 13 more > 2024-06-19 16:30:44,854 ERROR > [KeyDeletingService#0]-org.apache.hadoop.ozone.om.service.KeyDeletingService: > Snapshot deep cleaning request failed. Will retry at next run. > com.google.protobuf.ServiceException: > org.apache.ratis.protocol.exceptions.ServerNotReadyException: > om135@group-323117034165 is not in [RUNNING]: current state is CLOSING > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:298) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:288) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.submitRequest(KeyDeletingService.java:392) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.updateDeepCleanedSnapshots(KeyDeletingService.java:373) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.processSnapshotDeepClean(KeyDeletingService.java:357) > at > org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:204) > at > org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121) > at > java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.util.concurrent.ExecutionException: > org.apache.ratis.protocol.exceptions.ServerNotReadyException: > om135@group-323117034165 is not in [RUNNING]: current state is CLOSING > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:296) > ... 13 more > Caused by: org.apache.ratis.protocol.exceptions.ServerNotReadyException: > om135@group-323117034165 is not in [RUNNING]: current state is CLOSING > at > org.apache.ratis.server.impl.RaftServerImpl.lambda$assertLifeCycleState$9(RaftServerImpl.java:749) > at > org.apache.ratis.util.LifeCycle.assertCurrentState(LifeCycle.java:253) > at > org.apache.ratis.server.impl.RaftServerImpl.assertLifeCycleState(RaftServerImpl.java:748) > at > org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:838) > at > org.apache.ratis.server.impl.RaftServerImpl.lambda$null$12(RaftServerImpl.java:831) > at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117) > at > org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$13(RaftServerImpl.java:831) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org