[ 
https://issues.apache.org/jira/browse/HDDS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDDS-10596.
------------------------------------
    Resolution: Workaround

We've not seen this problem after bumping up ulimit value.
HDDS-12981 to recommend users to tune ulimit.

> [snapshot-LR] OM error due to 'Too many open files' exception
> -------------------------------------------------------------
>
>                 Key: HDDS-10596
>                 URL: https://issues.apache.org/jira/browse/HDDS-10596
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Snapshot
>            Reporter: Jyotirmoy Sinha
>            Priority: Major
>              Labels: ozone-snapshot
>
> OM error due to 'Too many open files' exception
> Scenario :
>  * Generate data over parallel threads over various volume/buckets
>  * Perform parallel snapshot create/delete/list operations over above buckets
>  * Perform parallel snapdiff operations over each bucket
>  * Perform parallel read operations of snapshot contents
>  * Introduce OM and cluster restarts in between along with DN decommissioning 
> and balancer restarts.
> OM node 1 stacktrace -
> {code:java}
> 2024-03-26 03:02:48,142 [Thread-135148] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't get num of keys in 
> SST '027673': While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-sst-backup/027673.sst:
>  Too many open files
> 2024-03-26 03:02:48,143 [Thread-135148] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't get num of keys in 
> SST '027669': While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-sst-backup/027669.sst:
>  Too many open files
> 2024-03-26 03:02:48,143 [Thread-135148] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't get num of keys in 
> SST '027664': While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-sst-backup/027664.sst:
>  Too many open files
> 2024-03-26 03:02:48,143 [Thread-135148] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't get num of keys in 
> SST '027659': While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-sst-backup/027659.sst:
>  Too many open files
> 2024-03-26 03:02:48,143 [Thread-135148] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't get num of keys in 
> SST '027654': While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-sst-backup/027654.sst:
>  Too many open files
> 2024-03-26 03:02:48,143 [Thread-135148] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't get num of keys in 
> SST '027649': While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-sst-backup/027649.sst:
>  Too many open files
> 2024-03-26 03:02:48,145 [Thread-135156] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Failed to read SST file: 
> /var/lib/hadoop-ozone/om/data/om.db/027674.sst.
> org.rocksdb.RocksDBException: While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/om.db/027674.sst: Too many open files
>         at org.rocksdb.SstFileReader.open(Native Method)
>         at org.rocksdb.SstFileReader.open(SstFileReader.java:48)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.toFileInfoList(RocksDBCheckpointDiffer.java:1529)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.access$500(RocksDBCheckpointDiffer.java:113)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer$2.onCompactionCompleted(RocksDBCheckpointDiffer.java:508)
>         at 
> org.rocksdb.AbstractEventListener.onCompactionCompletedProxy(AbstractEventListener.java:191)
> 2024-03-26 03:02:48,146 [Thread-135156] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Failed to read SST file: 
> /var/lib/hadoop-ozone/om/data/om.db/027670.sst.
> org.rocksdb.RocksDBException: While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/om.db/027670.sst: Too many open files
>         at org.rocksdb.SstFileReader.open(Native Method)
>         at org.rocksdb.SstFileReader.open(SstFileReader.java:48)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.toFileInfoList(RocksDBCheckpointDiffer.java:1529)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.access$500(RocksDBCheckpointDiffer.java:113)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer$2.onCompactionCompleted(RocksDBCheckpointDiffer.java:508)
>         at 
> org.rocksdb.AbstractEventListener.onCompactionCompletedProxy(AbstractEventListener.java:191)
> 2024-03-26 03:02:48,146 [Thread-135156] WARN 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Failed to read SST file: 
> /var/lib/hadoop-ozone/om/data/om.db/027665.sst.
> org.rocksdb.RocksDBException: While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/om.db/027665.sst: Too many open files {code}
> OM node 2 stacktrace -
> {code:java}
> 2024-03-26 03:02:56,365 [SstFilteringService#0] ERROR 
> org.apache.hadoop.ozone.om.OmSnapshotManager: Failed to retrieve snapshot: 
> /voltest21711445198/buck1/snap109
> java.io.IOException: Failed init RocksDB, db path : 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-edc531ad-f347-40cf-8524-082d0b9bcb79,
>  exception :org.rocksdb.RocksDBException While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-edc531ad-f347-40cf-8524-082d0b9bcb79/015382.sst:
>  Too many open files
>         at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
>         at 
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
>         at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:600)
>         at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:408)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:357)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$1(SnapshotCache.java:147)
>         at 
> java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:143)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:129)
>         at 
> org.apache.hadoop.ozone.om.SstFilteringService$SstFilteringTask.call(SstFilteringService.java:185)
>         at 
> org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
>         at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: class 
> org.apache.hadoop.hdds.utils.db.RocksDatabase: Failed to open 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-edc531ad-f347-40cf-8524-082d0b9bcb79;
>  status : IOError; message : While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-edc531ad-f347-40cf-8524-082d0b9bcb79/015382.sst:
>  Too many open files
>         at 
> org.apache.hadoop.hdds.utils.HddsServerUtil.toIOException(HddsServerUtil.java:667)
>         at 
> org.apache.hadoop.hdds.utils.db.RocksDatabase.toIOException(RocksDatabase.java:95)
>         at 
> org.apache.hadoop.hdds.utils.db.RocksDatabase.open(RocksDatabase.java:172)
>         at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:113)
>         ... 19 more
> Caused by: org.rocksdb.RocksDBException: While open a file for random read: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-edc531ad-f347-40cf-8524-082d0b9bcb79/015382.sst:
>  Too many open files
>         at org.rocksdb.RocksDB.open(Native Method)
>         at org.rocksdb.RocksDB.open(RocksDB.java:307)
>         at 
> org.apache.hadoop.hdds.utils.db.managed.ManagedRocksDB.open(ManagedRocksDB.java:75)
>         at 
> org.apache.hadoop.hdds.utils.db.RocksDatabase.open(RocksDatabase.java:158)
>         ... 20 more
> 2024-03-26 03:02:56,365 [KeyDeletingService#0] ERROR 
> org.apache.hadoop.ozone.om.OmSnapshotManager: Failed to retrieve snapshot: 
> /voltest21711445207/buck1/snap188
> java.io.IOException: Failed init RocksDB, db path : 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-35ce1ccb-4f7a-412a-ac1a-7a44ebc822c4,
>  exception :org.rocksdb.RocksDBException While open a file for appending: 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-35ce1ccb-4f7a-412a-ac1a-7a44ebc822c4/026755.dbtmp:
>  Too many open files
>         at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
>         at 
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to