Re: RocksDB Statebackend

Aljoscha Krettek Tue, 12 Apr 2016 09:41:41 -0700

Hi,
I'm going to try and respond to each point:

1. This seems strange, could you give some background on parallelism,
number of operators with state and so on? Also, I'm assuming you are using
the partitioned state abstraction, i.e. getState(), correct?

2. your observations are pretty much correct. The reason why RocksDB is
slower is that the FsStateBackend basically stores the state in a Java
HashMap and writes the contents to HDFS when checkpointing. RocksDB stores
data in on-disk files and goes to them for every state access (of course
there are caches, but generally it is like this). I'm actually impressed
that it is still this fast in comparison.

3. see 1. (I think for now)

4. The checkpointing time is the time from the JobManager deciding to start
a checkpoint until all tasks have confirmed that checkpoint. I have seen
this before and I think it results from back pressure. The problem is that
the checkpoint messages that we sent through the topology are sitting at
the sources because they are also back pressured by the slow processing of
normal records. You should be able to see the actual checkpointing times
(both synchronous and asynchronous) in the log files of the task managers,
they should be very much lower.

I can go into details, I'm just writing this quickly before calling it a
day. :-)

Cheers,
Aljoscha

On Tue, 12 Apr 2016 at 18:21 Konstantin Knauf <konstantin.kn...@tngtech.com>
wrote:

> Hi everyone,
>
> my experience with RocksDBStatebackend have left me a little bit
> confused. Maybe you guys can confirm that my epxierence is the expected
> behaviour ;):
>
> I have run a "performancetest" twice, once with FsStateBackend and once
> RocksDBStatebackend in comparison. In this particular test the state
> saved is generally not large (in a production scenario it will be larger.)
>
> These are my observations:
>
> 1. Minimal Checkpoint Size (no records) with RocksDB was 33MB compared
> to <<1MB with the FSStatebackend.
>
> 2. Throughput dropped from 28k/s -> 18k/s on a small cluster.
>
> 3. Checkpoint sizes as reported in the Dashboard was ca. 1MB for
> FsStatebackend but >100MB for RocksDbStatebackend. I hope the difference
> gets smaller for very large state. Can you confirm?
>
> 4. Checkpointing Times as reported in the Dashboard were 26secs for
> RocksDB during the test and <1 second for FsStatebackend. Does the
> reported time correspond to the sync. + asynchronous part of the
> checkpointing in case of RocksDB? Is there any way to tell how long the
> synchronous part takes?
>
> Form these first observations RocksDB does seem to bring a large
> overhead for state < 1GB, I guess? Is this expected?
>
> Cheers,
>
> Konstantin
>

Re: RocksDB Statebackend

Reply via email to