Hang, Il giorno mar 15 mar 2022 alle ore 02:47 Hang Chen <chenh...@apache.org> ha scritto: > > Hi BookKeeper Community, > > For BookKeeper 4.14.0+, I have noticed that index deletion is > sometimes taking around 60 seconds which cause the CPU to spike to > 100% > ``` > [2022-02-28T07:25:42.531Z] INFO db-storage-cleanup-10-1 > EntryLocationIndex:191 Deleting indexes for ledgers: [3385184, > 3385239, 3385159, 3385142, 3385124, 3385193, 3384879, 3385165, > 3385916] > [2022-02-28T07:26:34.089Z] INFO db-storage-cleanup-10-1 > EntryLocationIndex:266 Deleted indexes for 201065 entries from 9 > ledgers in 51.557 seconds > [2022-02-28T07:40:42.534Z] INFO db-storage-cleanup-10-1 > EntryLocationIndex:191 Deleting indexes for ledgers: [3385379, > 3385367, 3385718, 3385365, 3385412, 3385167, 3385357, 3386141] > [2022-02-28T07:41:47.867Z] INFO db-storage-cleanup-10-1 > EntryLocationIndex:266 Deleted indexes for 134590 entries from 8 > ledgers in 65.332 seconds > ``` > > RocksDB compaction is a heavy operation and the checkpoint will be > triggered in high frequency, which causes db-storage-cleanup thread > always into high load, and makes the cpu keep 100%. > > This change was introduced by > https://github.com/apache/bookkeeper/pull/2686, The motivation of this > Pr is: > > > After deleting many ledgers, seeking to the end of the RocksDB metadata can > > take a long time and trigger timeouts upstream. Address this by improving > > the seek logic as well as compacting out tombstones in situations where > > we've just deleted many entries. This affects the entry location index and > > the ledger metadata index. > > For RocksDB, the CompactRange operation is a high overload operation. > we'd better avoid manual calls. Since RocksDB 7.0, the `compactRange` > API has been removed. > https://github.com/facebook/rocksdb/pull/9444 > > IMO, we'd better remove the manual call compactRange in this PR, and > increase the `max_background_jobs` to accelerate auto compaction. > > Would you please give me more ideas? I don't have much experience with RocksDB.
Did you make a prototype ? Sharing some results in a prototype would help a lot. I am not sure, but maybe we can add a option to enable/disable manual compaction and to tune max_background_jobs this way we can rollback in case of problems with your proposal Enrico > > Thanks, > Hang