[ https://issues.apache.org/jira/browse/HDDS-12481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937025#comment-17937025 ]
Ivan Andika edited comment on HDDS-12481 at 3/20/25 7:13 AM: ------------------------------------------------------------- [~sshenoy] Thanks for the investigation. Yes, 1024 max open files OS config is way too small. If possible, could you help to check the RocksDB LOG file to verify that it's indeed the root cause? [~jyosin] Is it possible to run this tests with the higher ulimit (e.g. 1000000), might also consider increasing the mmap file as well (just in case RocksDB also use mmap files). {code:java} cat /etc/sysctl.conf | tail -n 3 fs.nr_open = 10000000 fs.file-max = 10000000 vm.max_map_count = 10000000{code} was (Author: JIRAUSER298977): [~sshenoy] Thanks for the investigation. Yes, 1024 max open files OS config is way too small. If possible, could you help to check the RocksDB LOG file to verify that it's indeed the root cause? [~jyosin] Is it possible to run this tests with the higher ulimit (e.g. 1000000), might also consider increasing the mmap file as well. {code:java} cat /etc/sysctl.conf | tail -n 3 fs.nr_open = 10000000 fs.file-max = 10000000 vm.max_map_count = 10000000{code} > OM down due to OMDoubleBuffer error > ----------------------------------- > > Key: HDDS-12481 > URL: https://issues.apache.org/jira/browse/HDDS-12481 > Project: Apache Ozone > Issue Type: Bug > Components: Snapshot > Reporter: Jyotirmoy Sinha > Priority: Major > > Scenario - > # 150 Volumes > # 4 buckets per volume of combinations - > ## ratis - fso > ## ratis - obs > ## ec - fso > ## ec -obs > # 7 snapshots - > ## 4 backup snapshots > ## 3 replication snapshots > # Total number of snapshots = 150 * 4 * 7 = 4200 > # 500K keys per snapshot (key scale to be configurable) > # Repeat steps 3-6 for continuous intervals > OM Error stacktrace - > {code:java} > 2025-03-03 09:53:30,921 INFO > [grpc-default-executor-644]-org.apache.ratis.grpc.server.GrpcLogAppender: > om122@group-9E840F57059A->om121-InstallSnapshotResponseHandler: > InstallSnapshot in progress. > 2025-03-03 09:53:30,921 ERROR > [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: > Terminating with exit status 1: During flush to DB encountered error in > OMDoubleBuffer flush thread OMDoubleBufferFlushThread when handling > OMRequest: cmdType: CreateSnapshot > traceID: "" > success: true > status: OK > CreateSnapshotResponse { > snapshotInfo { > snapshotID { > mostSigBits: 7137775835294879303 > leastSigBits: -8902643940451942791 > } > name: "snap1741024398" > volumeName: "bofavol-38" > bucketName: "bofabuck-38-ratfso" > snapshotStatus: SNAPSHOT_ACTIVE > creationTime: 1741024409564 > deletionTime: 18446744073709551615 > pathPreviousSnapshotID { > mostSigBits: -1057436070549500687 > leastSigBits: -5683722903965903247 > } > globalPreviousSnapshotID { > mostSigBits: 8988231063147725793 > leastSigBits: -5103262545165094617 > } > snapshotPath: "bofavol-38/bofabuck-38-ratfso" > checkpointDir: "-630e794d-fd1e-4e47-8473-74811f93fe79" > dbTxSequenceNumber: 323480877 > deepClean: true > sstFiltered: false > } > } java.io.IOException: Rocks Database is closed > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.acquire(RocksDatabase.java:439) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.newIterator(RocksDatabase.java:777) > at > org.apache.hadoop.hdds.utils.db.RDBTable.iterator(RDBTable.java:232) > at > org.apache.hadoop.hdds.utils.db.TypedTable.iterator(TypedTable.java:418) > at > org.apache.hadoop.hdds.utils.db.TypedTable.iterator(TypedTable.java:55) > at > org.apache.hadoop.ozone.om.OmSnapshotManager.deleteKeysFromDelKeyTableInSnapshotScope(OmSnapshotManager.java:558) > at > org.apache.hadoop.ozone.om.OmSnapshotManager.createOmSnapshotCheckpoint(OmSnapshotManager.java:437) > at > org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotCreateResponse.addToDBBatch(OMSnapshotCreateResponse.java:81) > at > org.apache.hadoop.ozone.om.response.OMClientResponse.checkAndUpdateDB(OMClientResponse.java:73) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$5(OzoneManagerDoubleBuffer.java:383) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:221) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatch(OzoneManagerDoubleBuffer.java:382) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushBatch(OzoneManagerDoubleBuffer.java:325) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushCurrentBuffer(OzoneManagerDoubleBuffer.java:298) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:263) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org