You should be able to tune your setup to avoid the OOM problem you have run
into with RocksDB. It will grow to use all of the memory available to it,
but shouldn't leak. Perhaps something is misconfigured.

As for performance, with the FSStateBackend you can expect:

* much better throughput and average latency
* possibly worse worst-case latency, due to GC pauses

Large, multi-slot TMs can be more of a problem with the FSStateBackend
because of the increased scope for GC. You may want to run with more TMs,
each with fewer slots.

You'll also be giving up the possibility of taking advantage of quick,
local recovery [1] that comes for free when using incremental checkpoints
with RocksDB. Local recovery can still be used with the FSStateBackend, but
it comes at more of a cost.

As for migrating your state, you may be able to use the State Processor API
[2] to rewrite a savepoint taken from RocksDB so that it can be loaded by
the FSStateBackend, so long as you aren't using any windows or
ListCheckpointed state.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#task-local-recovery
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/libs/state_processor_api.html>

Best,
David

On Fri, Jul 17, 2020 at 1:53 PM Vijay Bhaskar <bhaskar.eba...@gmail.com>
wrote:

> Hi
> While doing scale testing we observed that FSStatebackend is out
> performing RocksDB.
> When using RocksDB, off heap  memory keeps growing over a period of time
> and after a day pod got terminated with OOM.
>
> Whereas the same data pattern FSStatebackend is running for days without
> any memory spike and OOM.
>
> Based on documentation this is what i understood:
>
> When the state size is so huge that we can't keep it in memory, that case
> RocksDB is preferred. That means indirectly when FSStatebackend is
> performing poorly when state size grows, RocksDB is preferred, right?
>
> Another question, We have run production using RocksDB for quite
> some time, if we switch to FSStatebackend, then what are the consequences?
> Following is what i can think of:
> Very first time i'll lose the state
> Thereafter w.r.t Save Points and Checkpoints the behavior is same ( I know
> there is no incremental checkpoint, but its a performance purpose)
>
> Other than that, do I see any issues?
>
> Regards
> Bhaskar
>
>

Reply via email to