Sorry, I just realized that I've send my feedbacks to Jingsong's email address, instead of the dev / user mailing list.
Please find my comments below. Thank you~ Xintong Song On Wed, Nov 27, 2019 at 4:32 PM Xintong Song <tonysong...@gmail.com> wrote: > As a participant of the discussion yesterday, I'm +1 for the proposal of > removing on-heap managed memory. > > And there's one thing I want to add. In order to "reserving" memory (where > memory consumers do not allocate MemorySegments from MemoryManager but > allocate the reserved memory themselves), we no longer support > pre-allocation of memory segments in FLIP-49. That means even if we do not > remove on-heap managed memory, the MemorySegment will not be allocated > unless requested by the consumer, and will be deallocated immediately when > released by the consumer. Thus, it is likely that the memory segments will > not always stays in the JVM old generation, and will be affected by GC / > swapping just like other java objects. > > @Jingsong, I'm not sure whether this will be related to the performance > issue that you mentioned. > > Thank you~ > > Xintong Song > > > > On Wed, Nov 27, 2019 at 12:10 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > >> Hi Stephan, >> >> +1 to default have off-heap managed memory. >> >> From the perspective of batch, In our long-term performance test and >> online practice: >> - There is no significant difference in performance between heap and >> off-heap memory. If it is a heap object, the JVM has many opportunities to >> optimize in JIT, so generally speaking, the heap object will be faster. But >> at present, the manage memory we use in Flink is used as binary. In this >> case, we use unsafe api to operate, so there is no obvious performance gap. >> - On the contrary, too much memory in the heap will affect the >> performance and latency of GC. >> >> But I'm not sure if we should only have off heap managed memory. >> According to previous experience, array and object operations in the JVM >> will be more beneficial. As mentioned earlier, the JVM/JIT will do a lot of >> optimization. >> - For vectorization, the way of array is obviously more conducive to >> calculation. JVM can have many optimizations in array loop. >> - We can consider using some deep code generation to generate some >> dynamic Java objects to further speed up the operators. The snappydata[1] >> has done some work in this area. >> >> So I am +0 to only have off-heap managed memory. Because we don't rely on >> heap memory right now, only a few ideas for the future. >> >> [1] https://github.com/SnappyDataInc/snappydata >> >> Best, >> Jingsong Lee >> >> On Wed, Nov 27, 2019 at 10:14 AM Stephan Ewen <se...@apache.org> wrote: >> >>> Hi all! >>> >>> Yesterday, some of the people involved in FLIP-49 had a long discussion >>> about managed memory in Flink. >>> Particularly, the fact that we have managed memory either on heap or off >>> heap and that FLIP-49 introduced having both of these types of memory at >>> the same time. >>> >>> ==> What we want to suggest is a simplification to only have off-heap >>> managed memory. >>> >>> The rationale is the following: >>> - Integrating state backends with managed memory means we need to >>> support "reserving" memory on top of creating MemorySegments. >>> Reserving memory isn't really possible on the Java Heap, but works >>> well off-heap >>> >>> - All components that will use managed memory will work with off-heap >>> managed memory: MemorySegment-based structures, RocksDB, possibly external >>> processes in the future. >>> >>> - A setup where state backends integrate with managed memory, but >>> managed memory is by default all on-heap breaks the RocksDB backend out of >>> the box experience. >>> >>> - The only state backend to not use managed memory is the >>> HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It >>> means that the HeapKeyedStateBackend always, also when all managed memory >>> is off-heap. >>> >>> - The larger use of the HeapKeyedStateBackend needs a larger JVM heap. >>> The current FLIP-49 way to get this is to "configure managed memory to >>> on-heap, but the managed memory will not be used, it just helps to >>> implicitly grow the heap through the way the heap size is computed. That is >>> a pretty confusing story. Especially when we start thinking about scenarios >>> where Flink runs as a library in pre-existing JVM, about the mini-cluster, >>> etc. It is simpler (and more accurate) to just say that the >>> HeapKeyedStateBackend does not participate in managed memory, and extensive >>> use of it requires to user to reserve heap memory (in FLIP-49 you have a >>> new TaskHeapMemory option to request that a larger heap should be created). >>> >>> ==> This seems to support all scenarios in a nice way out of the box. >>> >>> ==> This seems easier to understand for users. >>> >>> ==> This simplifies the implementation of resource profiles, >>> configuration, and computation of memory pools. >>> >>> >>> Does anybody have a concern about his? In particular, would any users be >>> impacted if MemorySegment based jobs (batch) would now run always with >>> off-heap memory? >>> >>> If no one raises an objection, we would update the FLIP-49 proposal to >>> have a default setup of dividing the Flink memory by default into 50% JVM >>> heap and 50% managed memory (or even 60%/40%). All state backends and batch >>> jobs will have a good out-of-the-box experience that way. >>> >>> Best, >>> Stephan >>> >> >> >> -- >> Best, Jingsong Lee >> >