[
https://issues.apache.org/jira/browse/HDDS-12481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937025#comment-17937025
]
Ivan Andika commented on HDDS-12481:
------------------------------------
[~sshenoy] Thanks for the investigation. Yes, 1024 max open files OS config is
way too small. If possible, could you help to check the RocksDB LOG file to
verify that it's indeed the root cause?
[~jyosin] Is it possible to run this tests with the higher ulimit (e.g.
1000000), might also consider increasing the mmap file as well.
cat /etc/sysctl.conf | tail -n 3
fs.nr_open = 10000000
fs.file-max = 10000000
vm.max_map_count = 10000000
> OM down due to OMDoubleBuffer error
> -----------------------------------
>
> Key: HDDS-12481
> URL: https://issues.apache.org/jira/browse/HDDS-12481
> Project: Apache Ozone
> Issue Type: Bug
> Components: Snapshot
> Reporter: Jyotirmoy Sinha
> Priority: Major
>
> Scenario -
> # 150 Volumes
> # 4 buckets per volume of combinations -
> ## ratis - fso
> ## ratis - obs
> ## ec - fso
> ## ec -obs
> # 7 snapshots -
> ## 4 backup snapshots
> ## 3 replication snapshots
> # Total number of snapshots = 150 * 4 * 7 = 4200
> # 500K keys per snapshot (key scale to be configurable)
> # Repeat steps 3-6 for continuous intervals
> OM Error stacktrace -
> {code:java}
> 2025-03-03 09:53:30,921 INFO
> [grpc-default-executor-644]-org.apache.ratis.grpc.server.GrpcLogAppender:
> om122@group-9E840F57059A->om121-InstallSnapshotResponseHandler:
> InstallSnapshot in progress.
> 2025-03-03 09:53:30,921 ERROR
> [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer:
> Terminating with exit status 1: During flush to DB encountered error in
> OMDoubleBuffer flush thread OMDoubleBufferFlushThread when handling
> OMRequest: cmdType: CreateSnapshot
> traceID: ""
> success: true
> status: OK
> CreateSnapshotResponse {
> snapshotInfo {
> snapshotID {
> mostSigBits: 7137775835294879303
> leastSigBits: -8902643940451942791
> }
> name: "snap1741024398"
> volumeName: "bofavol-38"
> bucketName: "bofabuck-38-ratfso"
> snapshotStatus: SNAPSHOT_ACTIVE
> creationTime: 1741024409564
> deletionTime: 18446744073709551615
> pathPreviousSnapshotID {
> mostSigBits: -1057436070549500687
> leastSigBits: -5683722903965903247
> }
> globalPreviousSnapshotID {
> mostSigBits: 8988231063147725793
> leastSigBits: -5103262545165094617
> }
> snapshotPath: "bofavol-38/bofabuck-38-ratfso"
> checkpointDir: "-630e794d-fd1e-4e47-8473-74811f93fe79"
> dbTxSequenceNumber: 323480877
> deepClean: true
> sstFiltered: false
> }
> } java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.acquire(RocksDatabase.java:439)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.newIterator(RocksDatabase.java:777)
> at
> org.apache.hadoop.hdds.utils.db.RDBTable.iterator(RDBTable.java:232)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.iterator(TypedTable.java:418)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.iterator(TypedTable.java:55)
> at
> org.apache.hadoop.ozone.om.OmSnapshotManager.deleteKeysFromDelKeyTableInSnapshotScope(OmSnapshotManager.java:558)
> at
> org.apache.hadoop.ozone.om.OmSnapshotManager.createOmSnapshotCheckpoint(OmSnapshotManager.java:437)
> at
> org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotCreateResponse.addToDBBatch(OMSnapshotCreateResponse.java:81)
> at
> org.apache.hadoop.ozone.om.response.OMClientResponse.checkAndUpdateDB(OMClientResponse.java:73)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$5(OzoneManagerDoubleBuffer.java:383)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:221)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatch(OzoneManagerDoubleBuffer.java:382)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushBatch(OzoneManagerDoubleBuffer.java:325)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushCurrentBuffer(OzoneManagerDoubleBuffer.java:298)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:263)
> at java.lang.Thread.run(Thread.java:748){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]