I am on the same page as Enrico: I don't have much experience with RocksDB; as you can see from discussion on the PR https://github.com/apache/bookkeeper/pull/2686#discussion_r613468033 the perf impact was a concern, OTOH the PR is fixing a perf issue a seek time.
I searched and found this guide: https://github.com/EighteenZi/rocksdb_wiki/blob/master/RocksDB-Tuning-Guide.md Is it possible that tuning max_background_compactions and some other parameters can help? This PR https://github.com/apache/bookkeeper/pull/3056 made the RocksDb tuning easier. I'll help with review of a PR (hopefully supplemented with perf tests results) but I cannot commit to fixing it. I hope you and Maurice (author of the original PR) can find a workable compromise. On Tue, Mar 15, 2022 at 7:14 AM Enrico Olivelli <eolive...@gmail.com> wrote: > Hang, > > Il giorno mar 15 mar 2022 alle ore 02:47 Hang Chen > <chenh...@apache.org> ha scritto: > > > > Hi BookKeeper Community, > > > > For BookKeeper 4.14.0+, I have noticed that index deletion is > > sometimes taking around 60 seconds which cause the CPU to spike to > > 100% > > ``` > > [2022-02-28T07:25:42.531Z] INFO db-storage-cleanup-10-1 > > EntryLocationIndex:191 Deleting indexes for ledgers: [3385184, > > 3385239, 3385159, 3385142, 3385124, 3385193, 3384879, 3385165, > > 3385916] > > [2022-02-28T07:26:34.089Z] INFO db-storage-cleanup-10-1 > > EntryLocationIndex:266 Deleted indexes for 201065 entries from 9 > > ledgers in 51.557 seconds > > [2022-02-28T07:40:42.534Z] INFO db-storage-cleanup-10-1 > > EntryLocationIndex:191 Deleting indexes for ledgers: [3385379, > > 3385367, 3385718, 3385365, 3385412, 3385167, 3385357, 3386141] > > [2022-02-28T07:41:47.867Z] INFO db-storage-cleanup-10-1 > > EntryLocationIndex:266 Deleted indexes for 134590 entries from 8 > > ledgers in 65.332 seconds > > ``` > > > > RocksDB compaction is a heavy operation and the checkpoint will be > > triggered in high frequency, which causes db-storage-cleanup thread > > always into high load, and makes the cpu keep 100%. > > > > This change was introduced by > > https://github.com/apache/bookkeeper/pull/2686, The motivation of this > > Pr is: > > > > > After deleting many ledgers, seeking to the end of the RocksDB > metadata can take a long time and trigger timeouts upstream. Address this > by improving the seek logic as well as compacting out tombstones in > situations where we've just deleted many entries. This affects the entry > location index and the ledger metadata index. > > > > For RocksDB, the CompactRange operation is a high overload operation. > > we'd better avoid manual calls. Since RocksDB 7.0, the `compactRange` > > API has been removed. > > https://github.com/facebook/rocksdb/pull/9444 > > > > IMO, we'd better remove the manual call compactRange in this PR, and > > increase the `max_background_jobs` to accelerate auto compaction. > > > > Would you please give me more ideas? > I don't have much experience with RocksDB. > > Did you make a prototype ? > > Sharing some results in a prototype would help a lot. > > I am not sure, but maybe we can add a option to enable/disable manual > compaction and to tune max_background_jobs > this way we can rollback in case of problems with your proposal > > Enrico > > > > > Thanks, > > Hang > -- Andrey Yegorov