Hemant Kumar created HDDS-12385:
-----------------------------------

             Summary: Snapshot dir is not getting deleted in 
OMSnapshotPurgeResponse
                 Key: HDDS-12385
                 URL: https://issues.apache.org/jira/browse/HDDS-12385
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Hemant Kumar


Similar to (HDDS-12210), sometimes SnapshotDeletingService interferes with its 
previous run and causes a failure in the clean up of snapshot's DB directory in 
OMSnapshotPurgeResponse. It happens if OMSnapshotPurge request is sitting in 
the double buffer while next SnapshotDeletingService starts running. Since it 
is sequential read on snapshotInfoTable, it will try to load the same snapshot 
which is purged in the previous run, causing the failure in clean-up.
{code:java}
2025-02-10 20:04:10,625 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest: 
Successfully executed snapshotPurgeRe
quest: {snapshotDBKeys: "/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2"} 
along with updating 
snapshots:{/ota/ozdls3_ota_tx1/cm-1546444511-1737819001713-0=SnapshotInfo{snapshotId:
 '36c9eeb8-2f82-4e4a-8653-da6cb6300f36', name: 'cm-1546444511-1737819001713-0', 
volumeName: 'ota', bucketName: 'ozdls3_ota_tx1', snapshotStatus: 
'SNAPSHOT_ACTIVE', creationTime: '1737839928796', deletionTime: '-1', 
pathPreviousSnapshotId: 'null', globalPreviousSnapshotId: 
'3cc2c5c3-27f8-4e35-877b-19e5c47b0952', snapshotPath: 'ota/ozdls3_ota_tx1', 
checkpointDir: '-36c9eeb8-2f82-4e4a-8653-da6cb6300f36', dbTxSequenceNumber: 
'4123731869', deepClean: 'true', sstFiltered: 'false'}, 
/ota/ozdls3_ota_va4/cm-1546435227-1736710089566-5=SnapshotInfo{snapshotId: 
'a2cb4600-057c-4745-a8fc-11bb021d2797', name: 'cm-1546435227-1736710089566-5', 
volumeName: 'ota', bucketName: 'ozdls3_ota_va4', snapshotStatus: 
'SNAPSHOT_DELETED', creationTime: '1737591471797', deletionTime: 
'1738615238845', pathPreviousSnapshotId: 
'1e53c5d8-b533-4db7-9d54-9cd2ecb68e0a', globalPreviousSnapshotId: 
'ff7f862b-0258-4093-b897-5cf038f367d4', snapshotPath: 'ota/ozdls3_ota_va4', 
checkpointDir: '-a2cb4600-057c-4745-a8fc-11bb021d2797', dbTxSequenceNumber: 
'4045246716', deepClean: 'false', sstFiltered: 'false'}, 
/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2=SnapshotInfo{snapshotId: 
'876ff853-f7de-4ca8-b1de-5c0da26c1414', name: 'cm-1546444511-1737387001671-2', 
volumeName: 'ota', bucketName: 'ozdls3_ota_tx1', snapshotStatus: 
'SNAPSHOT_DELETED', creationTime: '1737589982568', deletionTime: 
'1739218148004', pathPreviousSnapshotId: 'null', globalPreviousSnapshotId: 
'ff7f862b-0258-4093-b897-5cf038f367d4', snapshotPath: 'ota/ozdls3_ota_tx1', 
checkpointDir: '-876ff853-f7de-4ca8-b1de-5c0da26c1414', dbTxSequenceNumber: 
'4045178180', deepClean: 'false', sstFiltered: 'false'}}.
2025-02-10 20:04:10,625 WARN 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: 
Key: '/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2' does not exist in 
cache.
2025-02-10 20:04:10,633 WARN 
[SnapshotDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService: 
SnapshotDeletingService Background task execution took 329519846565ns > 
300000000000ns(timeout)
2025-02-10 20:04:10,633 INFO 
[SnapshotDeletingService#0]-org.apache.hadoop.ozone.om.helpers.OmKeyInfo: 
OmKeyInfo.getCodec ignorePipeline = true
2025-02-10 20:04:10,687 ERROR 
[SnapshotDeletingService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Failed to retrieve snapshot: /ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2
java.io.IOException: Failed init RocksDB, db path : 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414,
 exception :org.rocksdb.RocksDBException Corruption: IO error: No such file or 
directory: While open a file for random read: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/471985.ldb:
 No such file or directory in file 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/MANIFEST-472330
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
        at 
org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:589)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:402)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:360)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$1(SnapshotCache.java:147)
        at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:143)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager.checkForSnapshot(OmSnapshotManager.java:616)
        at 
org.apache.hadoop.ozone.om.service.SnapshotDeletingService$SnapshotDeletingTask.call(SnapshotDeletingService.java:169)
        at 
org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
        at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: class 
org.apache.hadoop.hdds.utils.db.RocksDatabase: Failed to open 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414;
 status : Corruption; message : Corruption: IO error: No such file or 
directory: While open a file for random read: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/471985.ldb:
 No such file or directory in file 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/MANIFEST-472330
        at 
org.apache.hadoop.hdds.utils.HddsServerUtil.toIOException(HddsServerUtil.java:667)
        at 
org.apache.hadoop.hdds.utils.db.RocksDatabase.toIOException(RocksDatabase.java:95)
        at 
org.apache.hadoop.hdds.utils.db.RocksDatabase.open(RocksDatabase.java:172)
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:113)
        ... 19 more
Caused by: org.rocksdb.RocksDBException: Corruption: IO error: No such file or 
directory: While open a file for random read: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/471985.ldb:
 No such file or directory in file 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/MANIFEST-472330
        at org.rocksdb.RocksDB.open(Native Method)
        at org.rocksdb.RocksDB.open(RocksDB.java:307)
        at 
org.apache.hadoop.hdds.utils.db.managed.ManagedRocksDB.open(ManagedRocksDB.java:75)
        at 
org.apache.hadoop.hdds.utils.db.RocksDatabase.open(RocksDatabase.java:158)
        ... 20 more
2025-02-10 20:04:11,128 ERROR 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse:
 Failed to delete snapshot directory 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414
 for snapshot 
/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2.hadoop.ozone.om.request.key.OMKeyDeleteRequest:
 Key delete failed. Volume:ota, Bucket:ozdls3_ota_tx3, Key:db/.
java.nio.file.DirectoryNotEmptyException: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414
        at 
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)che(OMKeyDeleteRequest.java:153)
        at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)OMKeyDeleteRequest.java:99)
        at 
java.nio.file.Files.delete(Files.java:1126)agerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378)
        at 
org.apache.commons.io.FileUtils.delete(FileUtils.java:1175)unCommand(OzoneManagerStateMachine.java:560)
        at 
org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1194)zoneManagerStateMachine.java:353)
        at 
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.deleteCheckpointDirectory(OMSnapshotPurgeResponse.java:130)
        at 
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.addToDBBatch(OMSnapshotPurgeResponse.java:100)
        at 
org.apache.hadoop.ozone.om.response.OMClientResponse.checkAndUpdateDB(OMClientResponse.java:73)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$5(OzoneManagerDoubleBuffer.java:382)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:220)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatch(OzoneManagerDoubleBuffer.java:381)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushBatch(OzoneManagerDoubleBuffer.java:324)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushCurrentBuffer(OzoneManagerDoubleBuffer.java:297)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:262)
        at java.lang.Thread.run(Thread.java:750){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to