Hi, FYI, the performance regression after upgrading RocksDB was clearly visible in all of our RocksDB related benchmarks, like for example:
http://codespeed.dak8s.net:8000/timeline/?ben=stateBackends.ROCKS&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=stateBackends.ROCKS_INC&env=2 (and many more in the State Backends executable) It's 6% to 12% across the board. Best, Piotrek śr., 11 sie 2021 o 13:42 张蛟 <zhangjia...@163.com> napisał(a): > Hi, Nico and yun: > Thanks for your great work and detail description on rocksdb > version upgrade and performance. About 800 jobs are using rocksdb state > backend in our production environment, and we > plan to upgrade more aim to solve the gc problems caused by large > states.Because of non-restrict memory control on rocksdb, we have to spend > a lot of time to solve the problem of memory usage beyond the physical > memory.With the support of strict block cache, things will become much > easier. Also, delete range api is useful for us too, so we prefer to > upgrade the rocksdb to the new release version and +1(non-binding). best, > zlzhang0122 > > At 2021-08-05 01:50:07, "Yun Tang" <myas...@live.com> wrote: > >Hi Yuval, > > > >Upgrading RocksDB version is a long story since Flink-1.10. > >When we first plan to introduce write buffer manager to help control the > memory usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from > current RocksDB-5.17. However, we found performance regression in our micro > benchmark on state operations [1] if bumped to RocksDB-5.18. We did not > figure the root cause at that time and decide to cherry pick the commits of > write buffer manager to our own FRocksDB [2]. And we finally released our > own frocksdbjni-5.17.2-artisans-2.0 at that time. > > > >As time goes no, more and more bugs or missed features have been reported > in the old RocksDB version. Such as: > > > > 1. Cannot support ARM platform [3] > > 2. Dose not have stable deleteRange API, which is useful for Flink > scale out [4] > > 3. Cannot support strict block cache [5] > > 4. Checkpoint might stuck if using UNIVERSVAL compaction strategy [6] > > 5. Uncontrolled log size make us disabled the RocksDB internal LOG [7] > > 6. RocksDB's optimizeForPointLookup option might cause data lost [8] > > 7. Current dummy entry used for memory control in RocksDB-5.17 is too > large, leading performance problem [9] > > 8. Cannot support alpine-based images. > > 9. ....... > > > >Some of the bugs are walked around, and some are still open. > > > >And we decide to make some changes from Flink-1.12. First of all, we > reported the performance regression problem compared with RocksDB-5.18 and > RocksDB-5.17 to RocksDB community [10]. However, as RocksDB-5.x versions > are a bit older for the community, and RocksJava usage might not be the > core part for facebook guys, we did not get useful replies. Thus, we decide > to figure out the root cause of performance regression by ourself. > >Fortunately, we find the cause via binary search the commits among > RocksDB-5.18 and RocksDB-5.17, and updated in the original thread [10]. To > be short, the performance regression is due to different implementation of > `__thread` and `thread_local` in gcc and would have more impact on dynamic > loading [11], which is also what current RocksJava jar package does. With > my patch [12], the performance regression would disappear if comparing > RocksDB-5.18 with RocksDB-5.17. > > > >Unfortunately, RocksDB-5.18 still has many bugs and we want to bump to > RocksDB-6.x. However, another performance regression appeared even with my > patch [12]. With previous knowledge, we know that we must verify the built > .so files with our java-based benchmark instead of using RocksDB built-in > db-bench. I started to search the 1340+ commits from RocksDB-5.18 to > RocksDB-6.11 to find the performance problem. However, I did not figure out > the root cause after spending several weeks this time. The performance > behaves up and down in those commits and I cannot get the commit which lead > the performance regression. Take this commit of integrating block cache > tracer in block-based table reader [13] for example, I noticed that this > commit would cause a bit performance regression and that might be the > useless usage accounting in operations, however, the problematic code was > changed in later commits. Thus, after several weeks digging, I have to give > up for the endless searching in the thousand commits temporarily. As > RocksDB community seems not make the project management system public, > unlike Apache's open JIRA systems, we do not know what benchmark they > actually run before releasing each version to guarantee the performance. > > > >With my patch [10] on latest RocksDB-6.20.3, we could get the results on > nexmark in the original thread sent by Stephan, and we can see the > performance behaves closely in many real-world cases. And we also hope new > features, such as direct buffer supporting [14] in RocksJava could help > improve RocksDB's performance in the future. > > > >Hope this could explain what we already did. > > > > > >[1] https://github.com/apache/flink-benchmarks > >[2] https://github.com/ververica/frocksdb/tree/FRocksDB-5.17.2 > >[3] https://issues.apache.org/jira/browse/FLINK-13598 > >[4] https://issues.apache.org/jira/browse/FLINK-21321 > >[5] https://github.com/facebook/rocksdb/issues/6247 > >[6] https://issues.apache.org/jira/browse/FLINK-21726 > >[7] https://issues.apache.org/jira/browse/FLINK-15068 > >[8] https://issues.apache.org/jira/browse/FLINK-17800 > >[9] https://github.com/facebook/rocksdb/pull/5175 > >[10] https://github.com/facebook/rocksdb/issues/5774 > >[11] http://david-grs.github.io/tls_performance_overhead_cost_linux/ > >[12] https://github.com/ververica/frocksdb/pull/19 > >[13] https://github.com/facebook/rocksdb/pull/5441/ > >[14] https://github.com/facebook/rocksdb/pull/2283 > > > > > >Best, > >Yun Tang > > > > > >On Wed, Aug 4, 2021 at 2:36 PM Yuval Itzchakov <yuva...@gmail.com<mailto: > yuva...@gmail.com>> wrote: > >We are heavy users of RocksDB and have had several issues with memory > managed in Kubernetes, most of them actually went away when we upgraded > from Flink 1.9 to 1.13. > > > >Do we know why there's such a huge performance regression? Can we improve > this somehow with some flag tweaking? It would be great if we see a more in > depth explanation of the gains vs losses of upgrading. > > > >On Wed, Aug 4, 2021 at 3:08 PM Stephan Ewen <se...@apache.org<mailto: > se...@apache.org>> wrote: > > > > > >________________________________ > >From: Nico Kruber <n...@apache.org> > >Sent: Thursday, August 5, 2021 0:10 > >To: u...@flink.apache.org <u...@flink.apache.org>; dev < > dev@flink.apache.org> > >Subject: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance > > > >That's actually also what I'm seeing most of the time and what I'd expect > to > >improve with the newer RocksDB version. > >Hence, I'd also favour the upgrade even if there is a slight catch with > >respect to performance - we should, however, continue to investigate this > >together with the RocksDB community. > > > > > >Nico > > > >On Wednesday, 4 August 2021 14:26:32 CEST David Anderson wrote: > >> I am hearing quite often from users who are struggling to manage memory > >> usage, and these are all users using RocksDB. While I don't know for > >> certain that RocksDB is the cause in every case, from my perspective, > >> getting the better memory stability of version 6.20 in place is > critical. > >> > >> Regards, > >> David > >> > >> On Wed, Aug 4, 2021 at 8:08 AM Stephan Ewen <se...@apache.org> wrote: > >> > Hi all! > >> > > >> > *!!! If you are a big user of the Embedded RocksDB State Backend and > have > >> > performance sensitive workloads, please read this !!!* > >> > > >> > I want to quickly raise some awareness for a RocksDB version upgrade > we > >> > plan to do, and some possible impact on application performance. > >> > > >> > *We plan to upgrade RocksDB to version 6.20.* That version of RocksDB > >> > unfortunately introduces some non-trivial performance regression. In > our > >> > Nexmark Benchmark, at least one query is up to 13% slower. > >> > With some fixes, this can be improved, but even then there is an > overall > >> > *regression up to 6% in some queries*. (See attached table for results > >> > from relevant Nexmark Benchmark queries). > >> > > >> > We would do this update nonetheless, because we need to get new > features > >> > and bugfixes from RocksDB in. > >> > > >> > Please respond to this mail thread if you have major concerns about > this. > >> > > >> > > >> > *### Fallback Plan* > >> > > >> > Optionally, we could fall back to Plan B, which is to upgrade RocksDB > only > >> > to version 5.18.4. > >> > Which has no performance regression (after applying a custom patch). > >> > > >> > While this spares us the performance degradation of RocksDB 6.20.x, > this > >> > > >> > has multiple disadvantages: > >> > - Does not include the better memory stability (strict cache > control) > >> > - Misses out on some new features which some users asked about > >> > - Does not have the latest RocksDB bugfixes > >> > > >> > The latest point is especially bad in my opinion. While we can > cherry-pick > >> > some bugfixes back (and have done this in the past), users typically > run > >> > into an issue first and need to trace it back to RocksDB, then one of > the > >> > committers can find the relevant patch from RocksDB master and > backport > >> > it. > >> > That isn't the greatest user experience. > >> > > >> > Because of those disadvantages, we would prefer to do the upgrade to > the > >> > newer RocksDB version despite the unfortunate performance regression. > >> > > >> > Best, > >> > Stephan > > > > > >-- > >Dr. Nico Kruber | Solutions Architect > > > >Follow us @VervericaData Ververica > >-- > >Join Flink Forward - The Apache Flink Conference > >Stream Processing | Event Driven | Real Time > >-- > >Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > >-- > >Ververica GmbH > >Registered at Amtsgericht Charlottenburg: HRB 158244 B > >Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton > >Wehner > > > > >