In tests comparing RocksDb to fs state backend we observe much lower
throughput, around 10x slower. While the lowered throughput is expected,
what's perplexing is that machine load is also very low with RocksDb,
typically falling to  < 25% CPU and negligible IO wait (around 0.1%). Our
test instances are EC2 c3.xlarge which are 4 virtual CPUs and 7.5G RAM,
each running a single TaskManager in YARN, with 6.5G allocated memory per
TaskManager. The instances also have 2x40G attached SSDs which we have
mapped to `taskmanager.tmp.dir`.

With FS state and 4 slots per TM, we will easily max out with an average
load average around 5 or 6, so we actually need throttle down the slots to
3. With RocksDb using the Flink SSD configured options we see a load
average at around 1. Also, load (and actual) throughput remain more or less
constant no matter how many slots we use. The weak load is spread over all
CPUs.

Here is a sample top:

Cpu0  : 20.5%us,  0.0%sy,  0.0%ni, 79.5%id,  0.0%wa,  0.0%hi,  0.0%si,
 0.0%st
Cpu1  : 18.5%us,  0.0%sy,  0.0%ni, 81.5%id,  0.0%wa,  0.0%hi,  0.0%si,
 0.0%st
Cpu2  : 11.6%us,  0.7%sy,  0.0%ni, 87.0%id,  0.7%wa,  0.0%hi,  0.0%si,
 0.0%st
Cpu3  : 12.5%us,  0.3%sy,  0.0%ni, 86.8%id,  0.0%wa,  0.0%hi,  0.3%si,
 0.0%st

Our pipeline uses tumbling windows, each with a ValueState keyed to a
3-tuple of one string and two ints.. Each ValueState comprises a small set
of tuples around 5-7 fields each. The WindowFunction simply diffs agains
the set and updates state if there is a diff.

Any ideas as to what the bottleneck is here? Any suggestions welcomed!

-Cliff

Reply via email to