Re: RocksDB efficiency and keyby

Yun Tang Thu, 21 Apr 2022 04:49:05 -0700

Hi Trystan,

You can use async-profiler[1] to detect the CPU stack within RocksDB to see 
what happened, maybe you can try to enable partitioned index & filters[2] if 
the call stack is occupied by loading index or filter block.


[1] https://github.com/jvm-profiling-tools/async-profiler
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#state-backend-rocksdb-memory-partitioned-index-filters

Best
Yun Tang


________________________________
From: Yaroslav Tkachenko <yaros...@goldsky.io>
Sent: Thursday, April 21, 2022 0:44
To: Trystan <entro...@gmail.com>
Cc: user <user@flink.apache.org>
Subject: Re: RocksDB efficiency and keyby

Yep, I'd give it another try. EBS could be too slow in some use-cases.

On Wed, Apr 20, 2022 at 9:39 AM Trystan 
<entro...@gmail.com<mailto:entro...@gmail.com>> wrote:
Thanks for the info! We're running EBS gp2 volumes... awhile back we tested 
local SSDs with a different job and didn't notice any gains, but that was 
likely due to an under-optimized job where the bottleneck was elsewhere

On Wed, Apr 20, 2022, 11:08 AM Yaroslav Tkachenko 
<yaros...@goldsky.io<mailto:yaros...@goldsky.io>> wrote:
Hey Trystan,

Based on my personal experience, good disk IO for RocksDB matters a lot. Are 
you using the fastest SSD storage you can get for RocskDB folders?

For example, when running on GCP, we noticed 10x throughput improvement by 
switching RocksDB storage to 
https://cloud.google.com/compute/docs/disks/local-ssd

On Wed, Apr 20, 2022 at 8:50 AM Trystan 
<entro...@gmail.com<mailto:entro...@gmail.com>> wrote:
Hello,

We have a job where its main purpose is to track whether or not we've 
previously seen a particular event - that's it. If it's new, we save it to an 
external database. If we've seen it, we block the write. There's a 3-day TTL to 
manage the state size. The downstream db can tolerate new data slipping through 
and reject the write - we mainly use the state to reduce writes.

We're starting to see some performance issues, even after adding 50% capacity 
to the job. After some number of days/weeks, it eventually goes into a constant 
backpressure situation. I'm wondering if there's something we can do to improve 
efficiency.

1. According to the flamegraph, 60-70% of the time is spent in RocksDB.get
2. The state is just a ValueState<Boolean>. I assume this is the smallest/most 
efficient state. The keyby is extremely high cardinality - are we better off 
with a lower cardinality and a MapState<String, Boolean> .contains() check?
3. Current configs: taskmanager.memory.process.size: 4g, 
taskmanager.memory.managed.fraction: 0.8 (increased from 0.6, didn't see much 
change)
4. Estimated num keys tops out somewhere around 9-10B. Estimated live data size 
somewhere around 250 GB. Attempting to switch to heap state immediately ran 
into OOM (parallelism: 120, 8gb memory each).

And perhaps the answer is just "scale out" :) but if there are any signals to 
know when we've reached the limit of current scale, it'd be great to know what 
signals to look for!

Thanks!
Trystan

Re: RocksDB efficiency and keyby

Reply via email to