Hemant Kumar created HDDS-12385:
-----------------------------------
Summary: Snapshot dir is not getting deleted in
OMSnapshotPurgeResponse
Key: HDDS-12385
URL: https://issues.apache.org/jira/browse/HDDS-12385
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Hemant Kumar
Similar to (HDDS-12210), sometimes SnapshotDeletingService interferes with its
previous run and causes a failure in the clean up of snapshot's DB directory in
OMSnapshotPurgeResponse. It happens if OMSnapshotPurge request is sitting in
the double buffer while next SnapshotDeletingService starts running. Since it
is sequential read on snapshotInfoTable, it will try to load the same snapshot
which is purged in the previous run, causing the failure in clean-up.
{code:java}
2025-02-10 20:04:10,625 INFO [OM StateMachine ApplyTransaction Thread -
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest:
Successfully executed snapshotPurgeRe
quest: {snapshotDBKeys: "/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2"}
along with updating
snapshots:{/ota/ozdls3_ota_tx1/cm-1546444511-1737819001713-0=SnapshotInfo{snapshotId:
'36c9eeb8-2f82-4e4a-8653-da6cb6300f36', name: 'cm-1546444511-1737819001713-0',
volumeName: 'ota', bucketName: 'ozdls3_ota_tx1', snapshotStatus:
'SNAPSHOT_ACTIVE', creationTime: '1737839928796', deletionTime: '-1',
pathPreviousSnapshotId: 'null', globalPreviousSnapshotId:
'3cc2c5c3-27f8-4e35-877b-19e5c47b0952', snapshotPath: 'ota/ozdls3_ota_tx1',
checkpointDir: '-36c9eeb8-2f82-4e4a-8653-da6cb6300f36', dbTxSequenceNumber:
'4123731869', deepClean: 'true', sstFiltered: 'false'},
/ota/ozdls3_ota_va4/cm-1546435227-1736710089566-5=SnapshotInfo{snapshotId:
'a2cb4600-057c-4745-a8fc-11bb021d2797', name: 'cm-1546435227-1736710089566-5',
volumeName: 'ota', bucketName: 'ozdls3_ota_va4', snapshotStatus:
'SNAPSHOT_DELETED', creationTime: '1737591471797', deletionTime:
'1738615238845', pathPreviousSnapshotId:
'1e53c5d8-b533-4db7-9d54-9cd2ecb68e0a', globalPreviousSnapshotId:
'ff7f862b-0258-4093-b897-5cf038f367d4', snapshotPath: 'ota/ozdls3_ota_va4',
checkpointDir: '-a2cb4600-057c-4745-a8fc-11bb021d2797', dbTxSequenceNumber:
'4045246716', deepClean: 'false', sstFiltered: 'false'},
/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2=SnapshotInfo{snapshotId:
'876ff853-f7de-4ca8-b1de-5c0da26c1414', name: 'cm-1546444511-1737387001671-2',
volumeName: 'ota', bucketName: 'ozdls3_ota_tx1', snapshotStatus:
'SNAPSHOT_DELETED', creationTime: '1737589982568', deletionTime:
'1739218148004', pathPreviousSnapshotId: 'null', globalPreviousSnapshotId:
'ff7f862b-0258-4093-b897-5cf038f367d4', snapshotPath: 'ota/ozdls3_ota_tx1',
checkpointDir: '-876ff853-f7de-4ca8-b1de-5c0da26c1414', dbTxSequenceNumber:
'4045178180', deepClean: 'false', sstFiltered: 'false'}}.
2025-02-10 20:04:10,625 WARN
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache:
Key: '/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2' does not exist in
cache.
2025-02-10 20:04:10,633 WARN
[SnapshotDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService:
SnapshotDeletingService Background task execution took 329519846565ns >
300000000000ns(timeout)
2025-02-10 20:04:10,633 INFO
[SnapshotDeletingService#0]-org.apache.hadoop.ozone.om.helpers.OmKeyInfo:
OmKeyInfo.getCodec ignorePipeline = true
2025-02-10 20:04:10,687 ERROR
[SnapshotDeletingService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Failed to retrieve snapshot: /ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2
java.io.IOException: Failed init RocksDB, db path :
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414,
exception :org.rocksdb.RocksDBException Corruption: IO error: No such file or
directory: While open a file for random read:
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/471985.ldb:
No such file or directory in file
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/MANIFEST-472330
at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
at
org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
at
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:589)
at
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:402)
at
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:360)
at
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
at
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$1(SnapshotCache.java:147)
at
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
at
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:143)
at
org.apache.hadoop.ozone.om.OmSnapshotManager.checkForSnapshot(OmSnapshotManager.java:616)
at
org.apache.hadoop.ozone.om.service.SnapshotDeletingService$SnapshotDeletingTask.call(SnapshotDeletingService.java:169)
at
org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: class
org.apache.hadoop.hdds.utils.db.RocksDatabase: Failed to open
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414;
status : Corruption; message : Corruption: IO error: No such file or
directory: While open a file for random read:
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/471985.ldb:
No such file or directory in file
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/MANIFEST-472330
at
org.apache.hadoop.hdds.utils.HddsServerUtil.toIOException(HddsServerUtil.java:667)
at
org.apache.hadoop.hdds.utils.db.RocksDatabase.toIOException(RocksDatabase.java:95)
at
org.apache.hadoop.hdds.utils.db.RocksDatabase.open(RocksDatabase.java:172)
at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:113)
... 19 more
Caused by: org.rocksdb.RocksDBException: Corruption: IO error: No such file or
directory: While open a file for random read:
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/471985.ldb:
No such file or directory in file
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414/MANIFEST-472330
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:307)
at
org.apache.hadoop.hdds.utils.db.managed.ManagedRocksDB.open(ManagedRocksDB.java:75)
at
org.apache.hadoop.hdds.utils.db.RocksDatabase.open(RocksDatabase.java:158)
... 20 more
2025-02-10 20:04:11,128 ERROR
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse:
Failed to delete snapshot directory
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414
for snapshot
/ota/ozdls3_ota_tx1/cm-1546444511-1737387001671-2.hadoop.ozone.om.request.key.OMKeyDeleteRequest:
Key delete failed. Volume:ota, Bucket:ozdls3_ota_tx3, Key:db/.
java.nio.file.DirectoryNotEmptyException:
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-876ff853-f7de-4ca8-b1de-5c0da26c1414
at
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)che(OMKeyDeleteRequest.java:153)
at
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)OMKeyDeleteRequest.java:99)
at
java.nio.file.Files.delete(Files.java:1126)agerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378)
at
org.apache.commons.io.FileUtils.delete(FileUtils.java:1175)unCommand(OzoneManagerStateMachine.java:560)
at
org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1194)zoneManagerStateMachine.java:353)
at
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.deleteCheckpointDirectory(OMSnapshotPurgeResponse.java:130)
at
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.addToDBBatch(OMSnapshotPurgeResponse.java:100)
at
org.apache.hadoop.ozone.om.response.OMClientResponse.checkAndUpdateDB(OMClientResponse.java:73)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$5(OzoneManagerDoubleBuffer.java:382)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:220)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatch(OzoneManagerDoubleBuffer.java:381)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushBatch(OzoneManagerDoubleBuffer.java:324)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushCurrentBuffer(OzoneManagerDoubleBuffer.java:297)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:262)
at java.lang.Thread.run(Thread.java:750){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]