Re: Performance issue associated with managed RocksDB memory

Yu Li Fri, 26 Jun 2020 06:56:35 -0700

To clarify, that my questions were all against the very original issue
instead of the WordCount job. The timers come from the window operator you
mentioned as the source of the original issue:
===========================================
bq. If I create a Flink job that has a single "heavy" operator (call it X)
that just keeps a simple state (per user) things work fast when testing how
many events / s sec the job can process. However, If I add downstream of X
a simplest possible window operator, things can get slow, especially when I
increase the parallelism
===========================================

Regarding the WordCount job, as Andrey explained, it's kind of an expected
result and you could find more instructions from the document I posted in
the last reply [1], and let me quote some lines here for your convenience:
===========================================

To tune memory-related performance issues, the following steps may be
helpful:

   -

   The first step to try and increase performance should be to increase the
   amount of managed memory. This usually improves the situation a lot,
   without opening up the complexity of tuning low-level RocksDB options.

   Especially with large container/process sizes, much of the total memory
   can typically go to RocksDB, unless the application logic requires a lot of
   JVM heap itself. The default managed memory fraction *(0.4)* is
   conservative and can often be increased when using TaskManagers with
   multi-GB process sizes.
   -

   The number of write buffers in RocksDB depends on the number of states
   you have in your application (states across all operators in the pipeline).
   Each state corresponds to one ColumnFamily, which needs its own write
   buffers. Hence, applications with many states typically need more memory
   for the same performance.
   -

   You can try and compare the performance of RocksDB with managed memory
   to RocksDB with per-column-family memory by setting
state.backend.rocksdb.memory.managed:
   false. Especially to test against a baseline (assuming no- or gracious
   container memory limits) or to test for regressions compared to earlier
   versions of Flink, this can be useful.

   Compared to the managed memory setup (constant memory pool), not using
   managed memory means that RocksDB allocates memory proportional to the
   number of states in the application (memory footprint changes with
   application changes). As a rule of thumb, the non-managed mode has (unless
   ColumnFamily options are applied) an upper bound of roughly “140MB *
   num-states-across-all-tasks * num-slots”. Timers count as state as well!

===========================================

Best Regards,
Yu

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/large_state_tuning.html#tuning-rocksdb-memory

On Fri, 26 Jun 2020 at 21:15, Andrey Zagrebin <azagrebin.apa...@gmail.com>
wrote:

> Hi Juha,
>
> I can also submit the more complex test with the bigger operator and and a
> window operator. There's just gonna be more code to read. Can I attach a
> file here or how should I submit a larger chuck of code?
>
>
> You can just attach the file with the code.
>
> 2. I'm not sure what would / should I look for.
>
> For 'taskmanager.memory.managed.fraction' I tried
>
> configuration.setDouble("taskmanager.memory.managed.fraction", 0.8);
>
>
> I think Yu meant increasing the managed memory because it might be not
> enough to host both X and window operator.
> You can do it by increasing this option: taskmanager.memory.managed.size
> [1], [2]
> also if you run Flink locally from your IDE, see notes for local execution
> [3].
>
> When you enable ‘state.backend.rocksdb.memory.managed’, RocksDB does not
> use more memory than the configured or default size of managed memory.
> Therefore, it starts to spill to disk and performance degrades but the
> memory usage is deterministic and you do not risk that your container gets
> killed with out-of-memory error.
>
> If you disable ‘state.backend.rocksdb.memory.managed’, RocksDB does some
> internal decisions about how much memory to allocate, so it can allocate
> more to be more performant and do less frequent spilling to disk. So maybe
> it gives more memory to window operator to spill less.
>
> Therefore, it would be nice to compare memory consumption of Flink process
> with ‘state.backend.rocksdb.memory.managed’ to be true and false.
>
> Anyways I do not know how we could control splitting of the configured
> managed memory among operators in a more optimal way.
>
> Best,
> Andrey
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#managed-memory
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-managed-size
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#local-execution
>
> On 26 Jun 2020, at 08:45, Juha Mynttinen <juha.myntti...@king.com> wrote:
>
> Andrey,
>
> A small clarification. The tweaked WordCount I posted earlier doesn't
> illustrate the issue I originally explained, i.e. the one where there's a
> bigger operator and a smallest possible windows operator. Instead, the
> modified WordCount illustrates the degraded performance of a very simple
> Flink application when using managed memory and increasing parallelism over
> a certain treshold. The performance is not degraded if not using managed
> memory (and parallelism is increased).
>
> I was hoping this kind of simple program would be easier to debug /
> profile.
>
> I can also submit the more complex test with the bigger operator and and a
> window operator. There's just gonna be more code to read. Can I attach a
> file here or how should I submit a larger chuck of code?
>
> Regards,
> Juha
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>
>
>

Re: Performance issue associated with managed RocksDB memory

Reply via email to