[ 
https://issues.apache.org/jira/browse/HDDS-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937226#comment-17937226
 ] 

Hemant Kumar commented on HDDS-11033:
-------------------------------------

[~jyosin] can you please double-check it and close it if not reproducible? 

> [snapshot-LR] OM down due to NullPointerException during SnapshotPurgeRequest
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-11033
>                 URL: https://issues.apache.org/jira/browse/HDDS-11033
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager, Snapshot
>            Reporter: Jyotirmoy Sinha
>            Priority: Major
>              Labels: ozone-snapshot
>
> Steps :
>  * Create multiple snapshots across various volume/buckets as background load
>  * Perform random snapshot operations such as create, delete, list, diff 
> across 4 test buckets of combination -
>  * 
>  ** EC - FSO and OBS
>  ** Ratis - FSO and OBS
>  * In parallel also perform repetitive reconstructions and re-replications on 
> the above buckets
> Error snippet in OM1 -
> {code:java}
> 2024-06-19 16:30:39,580 INFO 
> [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
> Created checkpoint : 
> /var/lib/hadoop-ozone/om/data793412/db.snapshots/checkpointState/om.db-a4f6fa69-cf80-4fea-a3d9-6684faceff89
>  for snapshot snap851
> 2024-06-19 16:30:40,764 ERROR [OM StateMachine ApplyTransaction Thread - 
> 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating 
> with exit status 1: Request cmdType: SnapshotPurge
> clientId: "client-DE4DAB584843"
> SnapshotPurgeRequest {
>   updatedSnapshotDBKey: "/testvol/buckecfso/snap1718809217"
> }
>  failed with exception
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest.validateAndUpdateCache(OMSnapshotPurgeRequest.java:107)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:560)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:353)
>         at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Error snippet in OM2 -
> {code:java}
> 2024-06-19 16:30:39,611 INFO 
> [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
> Created checkpoint : 
> /var/lib/hadoop-ozone/om/data793412/db.snapshots/checkpointState/om.db-a4f6fa69-cf80-4fea-a3d9-6684faceff89
>  for snapshot snap851
> 2024-06-19 16:30:40,770 ERROR [OM StateMachine ApplyTransaction Thread - 
> 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating 
> with exit status 1: Request cmdType: SnapshotPurge
> clientId: "client-DE4DAB584843"
> SnapshotPurgeRequest {
>   updatedSnapshotDBKey: "/testvol/buckecfso/snap1718809217"
> }
>  failed with exception
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest.validateAndUpdateCache(OMSnapshotPurgeRequest.java:107)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:560)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:353)
>         at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-06-19 16:30:40,777 INFO 
> [shutdown-hook-0]-org.apache.ranger.audit.provider.AuditProviderFactory: ==> 
> JVMShutdownHook.run() {code}
> Error snippet in OM3 -
> {code:java}
> 2024-06-19 16:30:44,852 ERROR 
> [KeyDeletingService#0]-org.apache.hadoop.ozone.om.service.KeyDeletingService: 
> Snapshot deep cleaning request failed. Will retry at next run.
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException: OM:om135 is not 
> the leader. Could not determine the leader node.
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:462)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:289)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.submitRequest(KeyDeletingService.java:392)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.updateDeepCleanedSnapshots(KeyDeletingService.java:373)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.processSnapshotDeepClean(KeyDeletingService.java:357)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:204)
>         at 
> org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
>         at 
> java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
>         at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>         at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException: 
> OM:om135 is not the leader. Could not determine the leader node.
>         at 
> org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException.convertToOMNotLeaderException(OMNotLeaderException.java:86)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:463)
>         ... 13 more
> 2024-06-19 16:30:44,854 ERROR 
> [KeyDeletingService#0]-org.apache.hadoop.ozone.om.service.KeyDeletingService: 
> Snapshot deep cleaning request failed. Will retry at next run.
> com.google.protobuf.ServiceException: 
> org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
> om135@group-323117034165 is not in [RUNNING]: current state is CLOSING
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:298)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:288)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.submitRequest(KeyDeletingService.java:392)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.updateDeepCleanedSnapshots(KeyDeletingService.java:373)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.processSnapshotDeepClean(KeyDeletingService.java:357)
>         at 
> org.apache.hadoop.ozone.om.service.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:204)
>         at 
> org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
>         at 
> java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
>         at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>         at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
> om135@group-323117034165 is not in [RUNNING]: current state is CLOSING
>         at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
>         at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:296)
>         ... 13 more
> Caused by: org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
> om135@group-323117034165 is not in [RUNNING]: current state is CLOSING
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$assertLifeCycleState$9(RaftServerImpl.java:749)
>         at 
> org.apache.ratis.util.LifeCycle.assertCurrentState(LifeCycle.java:253)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.assertLifeCycleState(RaftServerImpl.java:748)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:838)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$null$12(RaftServerImpl.java:831)
>         at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$13(RaftServerImpl.java:831)
>         at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>         ... 3 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to