Hi all!

Yesterday, some of the people involved in FLIP-49 had a long discussion
about managed memory in Flink.
Particularly, the fact that we have managed memory either on heap or off
heap and that FLIP-49 introduced having both of these types of memory at
the same time.

==> What we want to suggest is a simplification to only have off-heap
managed memory.

The rationale is the following:
  - Integrating state backends with managed memory means we need to support
"reserving" memory on top of creating MemorySegments.
    Reserving memory isn't really possible on the Java Heap, but works well
off-heap

  - All components that will use managed memory will work with off-heap
managed memory: MemorySegment-based structures, RocksDB, possibly external
processes in the future.

  - A setup where state backends integrate with managed memory, but managed
memory is by default all on-heap breaks the RocksDB backend out of the box
experience.

  - The only state backend to not use managed memory is the
HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It
means that the HeapKeyedStateBackend always, also when all managed memory
is off-heap.

  - The larger use of the HeapKeyedStateBackend needs a larger JVM heap.
The current FLIP-49 way to get this is to "configure managed memory to
on-heap, but the managed memory will not be used, it just helps to
implicitly grow the heap through the way the heap size is computed. That is
a pretty confusing story. Especially when we start thinking about scenarios
where Flink runs as a library in pre-existing JVM, about the mini-cluster,
etc. It is simpler (and more accurate) to just say that the
HeapKeyedStateBackend does not participate in managed memory, and extensive
use of it requires to user to reserve heap memory (in FLIP-49 you have a
new TaskHeapMemory option to request that a larger heap should be created).

==> This seems to support all scenarios in a nice way out of the box.

==> This seems easier to understand for users.

==> This simplifies the implementation of resource profiles, configuration,
and computation of memory pools.


Does anybody have a concern about his? In particular, would any users be
impacted if MemorySegment based jobs (batch) would now run always with
off-heap memory?

If no one raises an objection, we would update the FLIP-49 proposal to have
a default setup of dividing the Flink memory by default into 50% JVM heap
and 50% managed memory (or even 60%/40%). All state backends and batch jobs
will have a good out-of-the-box experience that way.

Best,
Stephan

Reply via email to