Re: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

Piotr Nowojski Sat, 14 Aug 2021 13:53:38 -0700

Hi,

FYI, the performance regression after upgrading RocksDB was clearly visible
in all of our RocksDB related benchmarks, like for example:


http://codespeed.dak8s.net:8000/timeline/?ben=stateBackends.ROCKS&env=2
http://codespeed.dak8s.net:8000/timeline/?ben=stateBackends.ROCKS_INC&env=2
(and many more in the State Backends executable)

It's 6% to 12% across the board.

Best,
Piotrek


śr., 11 sie 2021 o 13:42 张蛟 <zhangjia...@163.com> napisał(a):

> Hi, Nico and yun:
>        Thanks for your great work and detail description on rocksdb
> version upgrade and performance. About 800 jobs are using rocksdb state
> backend in our production environment, and we
> plan to upgrade more aim to solve the gc problems caused by large
> states.Because of non-restrict memory control on rocksdb, we have to spend
> a lot of time to solve the problem of memory usage beyond the physical
> memory.With the support of strict block cache, things will become much
> easier. Also, delete range api is useful for us too, so we prefer to
> upgrade the rocksdb to the new release version and +1(non-binding). best,
> zlzhang0122
>
> At 2021-08-05 01:50:07, "Yun Tang" <myas...@live.com> wrote:
> >Hi Yuval,
> >
> >Upgrading RocksDB version is a long story since Flink-1.10.
> >When we first plan to introduce write buffer manager to help control the
> memory usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from
> current RocksDB-5.17. However, we found performance regression in our micro
> benchmark on state operations [1] if bumped to RocksDB-5.18. We did not
> figure the root cause at that time and decide to cherry pick the commits of
> write buffer manager to our own FRocksDB [2]. And we finally released our
> own frocksdbjni-5.17.2-artisans-2.0 at that time.
> >
> >As time goes no, more and more bugs or missed features have been reported
> in the old RocksDB version. Such as:
> >
> >  1.  Cannot support ARM platform [3]
> >  2.  Dose not have stable deleteRange API, which is useful for Flink
> scale out [4]
> >  3.  Cannot support strict block cache [5]
> >  4.  Checkpoint might stuck if using UNIVERSVAL compaction strategy [6]
> >  5.  Uncontrolled log size make us disabled the RocksDB internal LOG [7]
> >  6.  RocksDB's optimizeForPointLookup option might cause data lost [8]
> >  7.  Current dummy entry used for memory control in RocksDB-5.17 is too
> large, leading performance problem [9]
> >  8.  Cannot support alpine-based images.
> >  9.  .......
> >
> >Some of the bugs are walked around, and some are still open.
> >
> >And we decide to make some changes from Flink-1.12. First of all, we
> reported the performance regression problem compared with RocksDB-5.18 and
> RocksDB-5.17 to RocksDB community [10]. However, as RocksDB-5.x versions
> are a bit older for the community, and RocksJava usage might not be the
> core part for facebook guys, we did not get useful replies. Thus, we decide
> to figure out the root cause of performance regression by ourself.
> >Fortunately, we find the cause via binary search the commits among
> RocksDB-5.18 and RocksDB-5.17, and updated in the original thread [10]. To
> be short, the performance regression is due to different implementation of
> `__thread` and `thread_local` in gcc and would have more impact on dynamic
> loading [11], which is also what current RocksJava jar package does. With
> my patch [12], the performance regression would disappear if comparing
> RocksDB-5.18 with RocksDB-5.17.
> >
> >Unfortunately, RocksDB-5.18 still has many bugs and we want to bump to
> RocksDB-6.x. However, another performance regression appeared even with my
> patch [12]. With previous knowledge, we know that we must verify the built
> .so files with our java-based benchmark instead of using RocksDB built-in
> db-bench. I started to search the 1340+ commits from RocksDB-5.18 to
> RocksDB-6.11 to find the performance problem. However, I did not figure out
> the root cause after spending several weeks this time. The performance
> behaves up and down in those commits and I cannot get the commit which lead
> the performance regression. Take this commit of integrating block cache
> tracer in block-based table reader [13] for example, I noticed that this
> commit would cause a bit performance regression and that might be the
> useless usage accounting in operations, however, the problematic code was
> changed in later commits. Thus, after several weeks digging, I have to give
> up for the endless searching in the thousand commits temporarily. As
> RocksDB community seems not make the project management system public,
> unlike Apache's open JIRA systems, we do not know what benchmark they
> actually run before releasing each version to guarantee the performance.
> >
> >With my patch [10] on latest RocksDB-6.20.3, we could get the results on
> nexmark in the original thread sent by Stephan, and we can see the
> performance behaves closely in many real-world cases. And we also hope new
> features, such as direct buffer supporting [14] in RocksJava could help
> improve RocksDB's performance in the future.
> >
> >Hope this could explain what we already did.
> >
> >
> >[1] https://github.com/apache/flink-benchmarks
> >[2] https://github.com/ververica/frocksdb/tree/FRocksDB-5.17.2
> >[3] https://issues.apache.org/jira/browse/FLINK-13598
> >[4] https://issues.apache.org/jira/browse/FLINK-21321
> >[5] https://github.com/facebook/rocksdb/issues/6247
> >[6] https://issues.apache.org/jira/browse/FLINK-21726
> >[7] https://issues.apache.org/jira/browse/FLINK-15068
> >[8] https://issues.apache.org/jira/browse/FLINK-17800
> >[9] https://github.com/facebook/rocksdb/pull/5175
> >[10] https://github.com/facebook/rocksdb/issues/5774
> >[11] http://david-grs.github.io/tls_performance_overhead_cost_linux/
> >[12] https://github.com/ververica/frocksdb/pull/19
> >[13] https://github.com/facebook/rocksdb/pull/5441/
> >[14] https://github.com/facebook/rocksdb/pull/2283
> >
> >
> >Best,
> >Yun Tang
> >
> >
> >On Wed, Aug 4, 2021 at 2:36 PM Yuval Itzchakov <yuva...@gmail.com<mailto:
> yuva...@gmail.com>> wrote:
> >We are heavy users of RocksDB and have had several issues with memory
> managed in Kubernetes, most of them actually went away when we upgraded
> from Flink 1.9 to 1.13.
> >
> >Do we know why there's such a huge performance regression? Can we improve
> this somehow with some flag tweaking? It would be great if we see a more in
> depth explanation of the gains vs losses of upgrading.
> >
> >On Wed, Aug 4, 2021 at 3:08 PM Stephan Ewen <se...@apache.org<mailto:
> se...@apache.org>> wrote:
> >
> >
> >________________________________
> >From: Nico Kruber <n...@apache.org>
> >Sent: Thursday, August 5, 2021 0:10
> >To: u...@flink.apache.org <u...@flink.apache.org>; dev <
> dev@flink.apache.org>
> >Subject: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance
> >
> >That's actually also what I'm seeing most of the time and what I'd expect
> to
> >improve with the newer RocksDB version.
> >Hence, I'd also favour the upgrade even if there is a slight catch with
> >respect to performance - we should, however, continue to investigate this
> >together with the RocksDB community.
> >
> >
> >Nico
> >
> >On Wednesday, 4 August 2021 14:26:32 CEST David Anderson wrote:
> >> I am hearing quite often from users who are struggling to manage memory
> >> usage, and these are all users using RocksDB. While I don't know for
> >> certain that RocksDB is the cause in every case, from my perspective,
> >> getting the better memory stability of version 6.20 in place is
> critical.
> >>
> >> Regards,
> >> David
> >>
> >> On Wed, Aug 4, 2021 at 8:08 AM Stephan Ewen <se...@apache.org> wrote:
> >> > Hi all!
> >> >
> >> > *!!!  If you are a big user of the Embedded RocksDB State Backend and
> have
> >> > performance sensitive workloads, please read this !!!*
> >> >
> >> > I want to quickly raise some awareness for a RocksDB version upgrade
> we
> >> > plan to do, and some possible impact on application performance.
> >> >
> >> > *We plan to upgrade RocksDB to version 6.20.* That version of RocksDB
> >> > unfortunately introduces some non-trivial performance regression. In
> our
> >> > Nexmark Benchmark, at least one query is up to 13% slower.
> >> > With some fixes, this can be improved, but even then there is an
> overall
> >> > *regression up to 6% in some queries*. (See attached table for results
> >> > from relevant Nexmark Benchmark queries).
> >> >
> >> > We would do this update nonetheless, because we need to get new
> features
> >> > and bugfixes from RocksDB in.
> >> >
> >> > Please respond to this mail thread if you have major concerns about
> this.
> >> >
> >> >
> >> > *### Fallback Plan*
> >> >
> >> > Optionally, we could fall back to Plan B, which is to upgrade RocksDB
> only
> >> > to version 5.18.4.
> >> > Which has no performance regression (after applying a custom patch).
> >> >
> >> > While this spares us the performance degradation of RocksDB 6.20.x,
> this
> >> >
> >> > has multiple disadvantages:
> >> >   - Does not include the better memory stability (strict cache
> control)
> >> >   - Misses out on some new features which some users asked about
> >> >   - Does not have the latest RocksDB bugfixes
> >> >
> >> > The latest point is especially bad in my opinion. While we can
> cherry-pick
> >> > some bugfixes back (and have done this in the past), users typically
> run
> >> > into an issue first and need to trace it back to RocksDB, then one of
> the
> >> > committers can find the relevant patch from RocksDB master and
> backport
> >> > it.
> >> > That isn't the greatest user experience.
> >> >
> >> > Because of those disadvantages, we would prefer to do the upgrade to
> the
> >> > newer RocksDB version despite the unfortunate performance regression.
> >> >
> >> > Best,
> >> > Stephan
> >
> >
> >--
> >Dr. Nico Kruber | Solutions Architect
> >
> >Follow us @VervericaData Ververica
> >--
> >Join Flink Forward - The Apache Flink Conference
> >Stream Processing | Event Driven | Real Time
> >--
> >Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >--
> >Ververica GmbH
> >Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
> >Wehner
> >
> >
>

Re: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

Reply via email to