Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Xintong Song Thu, 22 Aug 2019 05:51:50 -0700

Hi everyone,

I just updated the FLIP document on wiki [1], with the following changes.


   - Removed open question regarding MemorySegment allocation. As
   discussed, we exclude this topic from the scope of this FLIP.
   - Updated content about JVM direct memory parameter according to recent
   discussions, and moved the other options to "Rejected Alternatives" for the
   moment.
   - Added implementation steps.


Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors

On Mon, Aug 19, 2019 at 7:16 PM Stephan Ewen <[email protected]> wrote:

> @Xintong: Concerning "wait for memory users before task dispose and memory
> release": I agree, that's how it should be. Let's try it out.
>
> @Xintong @Jingsong: Concerning " JVM does not wait for GC when allocating
> direct memory buffer": There seems to be pretty elaborate logic to free
> buffers when allocating new ones. See
>
> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/jdk/file/tip/src/share/classes/java/nio/Bits.java#l643
>
> @Till: Maybe. If we assume that the JVM default works (like going with
> option 2 and not setting "-XX:MaxDirectMemorySize" at all), then I think it
> should be okay to set "-XX:MaxDirectMemorySize" to
> "off_heap_managed_memory + direct_memory" even if we use RocksDB. That is a
> big if, though, I honestly have no idea :D Would be good to understand
> this, though, because this would affect option (2) and option (1.2).
>
> On Mon, Aug 19, 2019 at 4:44 PM Xintong Song <[email protected]>
> wrote:
>
> > Thanks for the inputs, Jingsong.
> >
> > Let me try to summarize your points. Please correct me if I'm wrong.
> >
> >    - Memory consumers should always avoid returning memory segments to
> >    memory manager while there are still un-cleaned structures / threads
> > that
> >    may use the memory. Otherwise, it would cause serious problems by
> having
> >    multiple consumers trying to use the same memory segment.
> >    - JVM does not wait for GC when allocating direct memory buffer.
> >    Therefore even we set proper max direct memory size limit, we may
> still
> >    encounter direct memory oom if the GC cleaning memory slower than the
> >    direct memory allocation.
> >
> > Am I understanding this correctly?
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Aug 19, 2019 at 4:21 PM JingsongLee <[email protected]
> > .invalid>
> > wrote:
> >
> > > Hi stephan:
> > >
> > > About option 2:
> > >
> > > if additional threads not cleanly shut down before we can exit the
> task:
> > > In the current case of memory reuse, it has freed up the memory it
> > >  uses. If this memory is used by other tasks and asynchronous threads
> > >  of exited task may still be writing, there will be concurrent security
> > >  problems, and even lead to errors in user computing results.
> > >
> > > So I think this is a serious and intolerable bug, No matter what the
> > >  option is, it should be avoided.
> > >
> > > About direct memory cleaned by GC:
> > > I don't think it is a good idea, I've encountered so many situations
> > >  that it's too late for GC to cause DirectMemory OOM. Release and
> > >  allocate DirectMemory depend on the type of user job, which is
> > >  often beyond our control.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Stephan Ewen <[email protected]>
> > > Send Time:2019年8月19日(星期一) 15:56
> > > To:dev <[email protected]>
> > > Subject:Re: [DISCUSS] FLIP-49: Unified Memory Configuration for
> > > TaskExecutors
> > >
> > > My main concern with option 2 (manually release memory) is that
> segfaults
> > > in the JVM send off all sorts of alarms on user ends. So we need to
> > > guarantee that this never happens.
> > >
> > > The trickyness is in tasks that uses data structures / algorithms with
> > > additional threads, like hash table spill/read and sorting threads. We
> > need
> > > to ensure that these cleanly shut down before we can exit the task.
> > > I am not sure that we have that guaranteed already, that's why option
> 1.1
> > > seemed simpler to me.
> > >
> > > On Mon, Aug 19, 2019 at 3:42 PM Xintong Song <[email protected]>
> > > wrote:
> > >
> > > > Thanks for the comments, Stephan. Summarized in this way really makes
> > > > things easier to understand.
> > > >
> > > > I'm in favor of option 2, at least for the moment. I think it is not
> > that
> > > > difficult to keep it segfault safe for memory manager, as long as we
> > > always
> > > > de-allocate the memory segment when it is released from the memory
> > > > consumers. Only if the memory consumer continue using the buffer of
> > > memory
> > > > segment after releasing it, in which case we do want the job to fail
> so
> > > we
> > > > detect the memory leak early.
> > > >
> > > > For option 1.2, I don't think this is a good idea. Not only because
> the
> > > > assumption (regular GC is enough to clean direct buffers) may not
> > always
> > > be
> > > > true, but also it makes harder for finding problems in cases of
> memory
> > > > overuse. E.g., user configured some direct memory for the user
> > libraries.
> > > > If the library actually use more direct memory then configured, which
> > > > cannot be cleaned by GC because they are still in use, may lead to
> > > overuse
> > > > of the total container memory. In that case, if it didn't touch the
> JVM
> > > > default max direct memory limit, we cannot get a direct memory OOM
> and
> > it
> > > > will become super hard to understand which part of the configuration
> > need
> > > > to be updated.
> > > >
> > > > For option 1.1, it has the similar problem as 1.2, if the exceeded
> > direct
> > > > memory does not reach the max direct memory limit specified by the
> > > > dedicated parameter. I think it is slightly better than 1.2, only
> > because
> > > > we can tune the parameter.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Aug 19, 2019 at 2:53 PM Stephan Ewen <[email protected]>
> wrote:
> > > >
> > > > > About the "-XX:MaxDirectMemorySize" discussion, maybe let me
> > summarize
> > > > it a
> > > > > bit differently:
> > > > >
> > > > > We have the following two options:
> > > > >
> > > > > (1) We let MemorySegments be de-allocated by the GC. That makes it
> > > > segfault
> > > > > safe. But then we need a way to trigger GC in case de-allocation
> and
> > > > > re-allocation of a bunch of segments happens quickly, which is
> often
> > > the
> > > > > case during batch scheduling or task restart.
> > > > >   - The "-XX:MaxDirectMemorySize" (option 1.1) is one way to do
> this
> > > > >   - Another way could be to have a dedicated bookkeeping in the
> > > > > MemoryManager (option 1.2), so that this is a number independent of
> > the
> > > > > "-XX:MaxDirectMemorySize" parameter.
> > > > >
> > > > > (2) We manually allocate and de-allocate the memory for the
> > > > MemorySegments
> > > > > (option 2). That way we need not worry about triggering GC by some
> > > > > threshold or bookkeeping, but it is harder to prevent segfaults. We
> > > need
> > > > to
> > > > > be very careful about when we release the memory segments (only in
> > the
> > > > > cleanup phase of the main thread).
> > > > >
> > > > > If we go with option 1.1, we probably need to set
> > > > > "-XX:MaxDirectMemorySize" to "off_heap_managed_memory +
> > direct_memory"
> > > > and
> > > > > have "direct_memory" as a separate reserved memory pool. Because if
> > we
> > > > just
> > > > > set "-XX:MaxDirectMemorySize" to "off_heap_managed_memory +
> > > > jvm_overhead",
> > > > > then there will be times when that entire memory is allocated by
> > direct
> > > > > buffers and we have nothing left for the JVM overhead. So we either
> > > need
> > > > a
> > > > > way to compensate for that (again some safety margin cutoff value)
> or
> > > we
> > > > > will exceed container memory.
> > > > >
> > > > > If we go with option 1.2, we need to be aware that it takes
> elaborate
> > > > logic
> > > > > to push recycling of direct buffers without always triggering a
> full
> > > GC.
> > > > >
> > > > >
> > > > > My first guess is that the options will be easiest to do in the
> > > following
> > > > > order:
> > > > >
> > > > >   - Option 1.1 with a dedicated direct_memory parameter, as
> discussed
> > > > > above. We would need to find a way to set the direct_memory
> parameter
> > > by
> > > > > default. We could start with 64 MB and see how it goes in practice.
> > One
> > > > > danger I see is that setting this loo low can cause a bunch of
> > > additional
> > > > > GCs compared to before (we need to watch this carefully).
> > > > >
> > > > >   - Option 2. It is actually quite simple to implement, we could
> try
> > > how
> > > > > segfault safe we are at the moment.
> > > > >
> > > > >   - Option 1.2: We would not touch the "-XX:MaxDirectMemorySize"
> > > > parameter
> > > > > at all and assume that all the direct memory allocations that the
> JVM
> > > and
> > > > > Netty do are infrequent enough to be cleaned up fast enough through
> > > > regular
> > > > > GC. I am not sure if that is a valid assumption, though.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Thanks for sharing your opinion Till.
> > > > > >
> > > > > > I'm also in favor of alternative 2. I was wondering whether we
> can
> > > > avoid
> > > > > > using Unsafe.allocate() for off-heap managed memory and network
> > > memory
> > > > > with
> > > > > > alternative 3. But after giving it a second thought, I think even
> > for
> > > > > > alternative 3 using direct memory for off-heap managed memory
> could
> > > > cause
> > > > > > problems.
> > > > > >
> > > > > > Hi Yang,
> > > > > >
> > > > > > Regarding your concern, I think what proposed in this FLIP it to
> > have
> > > > > both
> > > > > > off-heap managed memory and network memory allocated through
> > > > > > Unsafe.allocate(), which means they are practically native memory
> > and
> > > > not
> > > > > > limited by JVM max direct memory. The only parts of memory
> limited
> > by
> > > > JVM
> > > > > > max direct memory are task off-heap memory and JVM overhead,
> which
> > > are
> > > > > > exactly alternative 2 suggests to set the JVM max direct memory
> to.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for the clarification Xintong. I understand the two
> > > > alternatives
> > > > > > > now.
> > > > > > >
> > > > > > > I would be in favour of option 2 because it makes things
> > explicit.
> > > If
> > > > > we
> > > > > > > don't limit the direct memory, I fear that we might end up in a
> > > > similar
> > > > > > > situation as we are currently in: The user might see that her
> > > process
> > > > > > gets
> > > > > > > killed by the OS and does not know why this is the case.
> > > > Consequently,
> > > > > > she
> > > > > > > tries to decrease the process memory size (similar to
> increasing
> > > the
> > > > > > cutoff
> > > > > > > ratio) in order to accommodate for the extra direct memory.
> Even
> > > > worse,
> > > > > > she
> > > > > > > tries to decrease memory budgets which are not fully used and
> > hence
> > > > > won't
> > > > > > > change the overall memory consumption.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Let me explain this with a concrete example Till.
> > > > > > > >
> > > > > > > > Let's say we have the following scenario.
> > > > > > > >
> > > > > > > > Total Process Memory: 1GB
> > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM Overhead):
> 200MB
> > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace, Off-Heap
> Managed
> > > > Memory
> > > > > > and
> > > > > > > > Network Memory): 800MB
> > > > > > > >
> > > > > > > >
> > > > > > > > For alternative 2, we set -XX:MaxDirectMemorySize to 200MB.
> > > > > > > > For alternative 3, we set -XX:MaxDirectMemorySize to a very
> > large
> > > > > > value,
> > > > > > > > let's say 1TB.
> > > > > > > >
> > > > > > > > If the actual direct memory usage of Task Off-Heap Memory and
> > JVM
> > > > > > > Overhead
> > > > > > > > do not exceed 200MB, then alternative 2 and alternative 3
> > should
> > > > have
> > > > > > the
> > > > > > > > same utility. Setting larger -XX:MaxDirectMemorySize will not
> > > > reduce
> > > > > > the
> > > > > > > > sizes of the other memory pools.
> > > > > > > >
> > > > > > > > If the actual direct memory usage of Task Off-Heap Memory and
> > JVM
> > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > >
> > > > > > > >    - Alternative 2 suffers from frequent OOM. To avoid that,
> > the
> > > > only
> > > > > > > thing
> > > > > > > >    user can do is to modify the configuration and increase
> JVM
> > > > Direct
> > > > > > > > Memory
> > > > > > > >    (Task Off-Heap Memory + JVM Overhead). Let's say that user
> > > > > increases
> > > > > > > JVM
> > > > > > > >    Direct Memory to 250MB, this will reduce the total size of
> > > other
> > > > > > > memory
> > > > > > > >    pools to 750MB, given the total process memory remains
> 1GB.
> > > > > > > >    - For alternative 3, there is no chance of direct OOM.
> There
> > > are
> > > > > > > chances
> > > > > > > >    of exceeding the total process memory limit, but given
> that
> > > the
> > > > > > > process
> > > > > > > > may
> > > > > > > >    not use up all the reserved native memory (Off-Heap
> Managed
> > > > > Memory,
> > > > > > > > Network
> > > > > > > >    Memory, JVM Metaspace), if the actual direct memory usage
> is
> > > > > > slightly
> > > > > > > > above
> > > > > > > >    yet very close to 200MB, user probably do not need to
> change
> > > the
> > > > > > > >    configurations.
> > > > > > > >
> > > > > > > > Therefore, I think from the user's perspective, a feasible
> > > > > > configuration
> > > > > > > > for alternative 2 may lead to lower resource utilization
> > compared
> > > > to
> > > > > > > > alternative 3.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till Rohrmann <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I guess you have to help me understand the difference
> between
> > > > > > > > alternative 2
> > > > > > > > > and 3 wrt to memory under utilization Xintong.
> > > > > > > > >
> > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize to Task
> Off-Heap
> > > > Memory
> > > > > > and
> > > > > > > > JVM
> > > > > > > > > Overhead. Then there is the risk that this size is too low
> > > > > resulting
> > > > > > > in a
> > > > > > > > > lot of garbage collection and potentially an OOM.
> > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize to something
> > larger
> > > > > than
> > > > > > > > > alternative 2. This would of course reduce the sizes of the
> > > other
> > > > > > > memory
> > > > > > > > > types.
> > > > > > > > >
> > > > > > > > > How would alternative 2 now result in an under utilization
> of
> > > > > memory
> > > > > > > > > compared to alternative 3? If alternative 3 strictly sets a
> > > > higher
> > > > > > max
> > > > > > > > > direct memory size and we use only little, then I would
> > expect
> > > > that
> > > > > > > > > alternative 3 results in memory under utilization.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi xintong,till
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Native and Direct Memory
> > > > > > > > > >
> > > > > > > > > > My point is setting a very large max direct memory size
> > when
> > > we
> > > > > do
> > > > > > > not
> > > > > > > > > > differentiate direct and native memory. If the direct
> > > > > > > memory,including
> > > > > > > > > user
> > > > > > > > > > direct memory and framework direct memory,could be
> > calculated
> > > > > > > > > > correctly,then
> > > > > > > > > > i am in favor of setting direct memory with fixed value.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Memory Calculation
> > > > > > > > > >
> > > > > > > > > > I agree with xintong. For Yarn and k8s,we need to check
> the
> > > > > memory
> > > > > > > > > > configurations in client to avoid submitting successfully
> > and
> > > > > > failing
> > > > > > > > in
> > > > > > > > > > the flink master.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > >
> > > > > > > > > > Yang
> > > > > > > > > >
> > > > > > > > > > Xintong Song <[email protected]>于2019年8月13日
> 周二22:07写道：
> > > > > > > > > >
> > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > >
> > > > > > > > > > > About MemorySegment, I think you are right that we
> should
> > > not
> > > > > > > include
> > > > > > > > > > this
> > > > > > > > > > > issue in the scope of this FLIP. This FLIP should
> > > concentrate
> > > > > on
> > > > > > > how
> > > > > > > > to
> > > > > > > > > > > configure memory pools for TaskExecutors, with minimum
> > > > > > involvement
> > > > > > > on
> > > > > > > > > how
> > > > > > > > > > > memory consumers use it.
> > > > > > > > > > >
> > > > > > > > > > > About direct memory, I think alternative 3 may not
> having
> > > the
> > > > > > same
> > > > > > > > over
> > > > > > > > > > > reservation issue that alternative 2 does, but at the
> > cost
> > > of
> > > > > > risk
> > > > > > > of
> > > > > > > > > > over
> > > > > > > > > > > using memory at the container level, which is not good.
> > My
> > > > > point
> > > > > > is
> > > > > > > > > that
> > > > > > > > > > > both "Task Off-Heap Memory" and "JVM Overhead" are not
> > easy
> > > > to
> > > > > > > > config.
> > > > > > > > > > For
> > > > > > > > > > > alternative 2, users might configure them higher than
> > what
> > > > > > actually
> > > > > > > > > > needed,
> > > > > > > > > > > just to avoid getting a direct OOM. For alternative 3,
> > > users
> > > > do
> > > > > > not
> > > > > > > > get
> > > > > > > > > > > direct OOM, so they may not config the two options
> > > > aggressively
> > > > > > > high.
> > > > > > > > > But
> > > > > > > > > > > the consequences are risks of overall container memory
> > > usage
> > > > > > > exceeds
> > > > > > > > > the
> > > > > > > > > > > budget.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till Rohrmann <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for proposing this FLIP Xintong.
> > > > > > > > > > > >
> > > > > > > > > > > > All in all I think it already looks quite good.
> > > Concerning
> > > > > the
> > > > > > > > first
> > > > > > > > > > open
> > > > > > > > > > > > question about allocating memory segments, I was
> > > wondering
> > > > > > > whether
> > > > > > > > > this
> > > > > > > > > > > is
> > > > > > > > > > > > strictly necessary to do in the context of this FLIP
> or
> > > > > whether
> > > > > > > > this
> > > > > > > > > > > could
> > > > > > > > > > > > be done as a follow up? Without knowing all details,
> I
> > > > would
> > > > > be
> > > > > > > > > > concerned
> > > > > > > > > > > > that we would widen the scope of this FLIP too much
> > > because
> > > > > we
> > > > > > > > would
> > > > > > > > > > have
> > > > > > > > > > > > to touch all the existing call sites of the
> > MemoryManager
> > > > > where
> > > > > > > we
> > > > > > > > > > > allocate
> > > > > > > > > > > > memory segments (this should mainly be batch
> > operators).
> > > > The
> > > > > > > > addition
> > > > > > > > > > of
> > > > > > > > > > > > the memory reservation call to the MemoryManager
> should
> > > not
> > > > > be
> > > > > > > > > affected
> > > > > > > > > > > by
> > > > > > > > > > > > this and I would hope that this is the only point of
> > > > > > interaction
> > > > > > > a
> > > > > > > > > > > > streaming job would have with the MemoryManager.
> > > > > > > > > > > >
> > > > > > > > > > > > Concerning the second open question about setting or
> > not
> > > > > > setting
> > > > > > > a
> > > > > > > > > max
> > > > > > > > > > > > direct memory limit, I would also be interested why
> > Yang
> > > > Wang
> > > > > > > > thinks
> > > > > > > > > > > > leaving it open would be best. My concern about this
> > > would
> > > > be
> > > > > > > that
> > > > > > > > we
> > > > > > > > > > > would
> > > > > > > > > > > > be in a similar situation as we are now with the
> > > > > > > > RocksDBStateBackend.
> > > > > > > > > > If
> > > > > > > > > > > > the different memory pools are not clearly separated
> > and
> > > > can
> > > > > > > spill
> > > > > > > > > over
> > > > > > > > > > > to
> > > > > > > > > > > > a different pool, then it is quite hard to understand
> > > what
> > > > > > > exactly
> > > > > > > > > > > causes a
> > > > > > > > > > > > process to get killed for using too much memory. This
> > > could
> > > > > > then
> > > > > > > > > easily
> > > > > > > > > > > > lead to a similar situation what we have with the
> > > > > cutoff-ratio.
> > > > > > > So
> > > > > > > > > why
> > > > > > > > > > > not
> > > > > > > > > > > > setting a sane default value for max direct memory
> and
> > > > giving
> > > > > > the
> > > > > > > > > user
> > > > > > > > > > an
> > > > > > > > > > > > option to increase it if he runs into an OOM.
> > > > > > > > > > > >
> > > > > > > > > > > > @Xintong, how would alternative 2 lead to lower
> memory
> > > > > > > utilization
> > > > > > > > > than
> > > > > > > > > > > > alternative 3 where we set the direct memory to a
> > higher
> > > > > value?
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong Song <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > I think setting a very large max direct memory size
> > > > > > definitely
> > > > > > > > has
> > > > > > > > > > some
> > > > > > > > > > > > > good sides. E.g., we do not worry about direct OOM,
> > and
> > > > we
> > > > > > > don't
> > > > > > > > > even
> > > > > > > > > > > > need
> > > > > > > > > > > > > to allocate managed / network memory with
> > > > > Unsafe.allocate() .
> > > > > > > > > > > > > However, there are also some down sides of doing
> > this.
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - One thing I can think of is that if a task
> > > executor
> > > > > > > > container
> > > > > > > > > is
> > > > > > > > > > > > >    killed due to overusing memory, it could be hard
> > for
> > > > use
> > > > > > to
> > > > > > > > know
> > > > > > > > > > > which
> > > > > > > > > > > > > part
> > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > >    - Another down side is that the JVM never
> trigger
> > GC
> > > > due
> > > > > > to
> > > > > > > > > > reaching
> > > > > > > > > > > > max
> > > > > > > > > > > > >    direct memory limit, because the limit is too
> high
> > > to
> > > > be
> > > > > > > > > reached.
> > > > > > > > > > > That
> > > > > > > > > > > > >    means we kind of relay on heap memory to trigger
> > GC
> > > > and
> > > > > > > > release
> > > > > > > > > > > direct
> > > > > > > > > > > > >    memory. That could be a problem in cases where
> we
> > > have
> > > > > > more
> > > > > > > > > direct
> > > > > > > > > > > > > memory
> > > > > > > > > > > > >    usage but not enough heap activity to trigger
> the
> > > GC.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Maybe you can share your reasons for preferring
> > > setting a
> > > > > > very
> > > > > > > > > large
> > > > > > > > > > > > value,
> > > > > > > > > > > > > if there are anything else I overlooked.
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > If there is any conflict between multiple
> > configuration
> > > > > that
> > > > > > > user
> > > > > > > > > > > > > explicitly specified, I think we should throw an
> > error.
> > > > > > > > > > > > > I think doing checking on the client side is a good
> > > idea,
> > > > > so
> > > > > > > that
> > > > > > > > > on
> > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > K8s we can discover the problem before submitting
> the
> > > > Flink
> > > > > > > > > cluster,
> > > > > > > > > > > > which
> > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > But we can not only rely on the client side
> checking,
> > > > > because
> > > > > > > for
> > > > > > > > > > > > > standalone cluster TaskManagers on different
> machines
> > > may
> > > > > > have
> > > > > > > > > > > different
> > > > > > > > > > > > > configurations and the client does see that.
> > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for your detailed proposal. After all the
> > > memory
> > > > > > > > > > configuration
> > > > > > > > > > > > are
> > > > > > > > > > > > > > introduced, it will be more powerful to control
> the
> > > > flink
> > > > > > > > memory
> > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We do not differentiate user direct memory and
> > native
> > > > > > memory.
> > > > > > > > > They
> > > > > > > > > > > are
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > included in task off-heap memory. Right? So i
> don’t
> > > > think
> > > > > > we
> > > > > > > > > could
> > > > > > > > > > > not
> > > > > > > > > > > > > set
> > > > > > > > > > > > > > the -XX:MaxDirectMemorySize properly. I prefer
> > > leaving
> > > > > it a
> > > > > > > > very
> > > > > > > > > > > large
> > > > > > > > > > > > > > value.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If the sum of and fine-grained memory(network
> > memory,
> > > > > > managed
> > > > > > > > > > memory,
> > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > is larger than total process memory, how do we
> deal
> > > > with
> > > > > > this
> > > > > > > > > > > > situation?
> > > > > > > > > > > > > Do
> > > > > > > > > > > > > > we need to check the memory configuration in
> > client?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song <[email protected]>
> 于2019年8月7日周三
> > > > > > 下午10:14写道：
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We would like to start a discussion thread on
> > > > "FLIP-49:
> > > > > > > > Unified
> > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > Configuration for TaskExecutors"[1], where we
> > > > describe
> > > > > > how
> > > > > > > to
> > > > > > > > > > > improve
> > > > > > > > > > > > > > > TaskExecutor memory configurations. The FLIP
> > > document
> > > > > is
> > > > > > > > mostly
> > > > > > > > > > > based
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > early design "Memory Management and
> Configuration
> > > > > > > > Reloaded"[2]
> > > > > > > > > by
> > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > with updates from follow-up discussions both
> > online
> > > > and
> > > > > > > > > offline.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This FLIP addresses several shortcomings of
> > current
> > > > > > (Flink
> > > > > > > > 1.9)
> > > > > > > > > > > > > > > TaskExecutor memory configuration.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Different configuration for Streaming and
> > > Batch.
> > > > > > > > > > > > > > >    - Complex and difficult configuration of
> > RocksDB
> > > > in
> > > > > > > > > Streaming.
> > > > > > > > > > > > > > >    - Complicated, uncertain and hard to
> > understand.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Key changes to solve the problems can be
> > summarized
> > > > as
> > > > > > > > follows.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Extend memory manager to also account for
> > > memory
> > > > > > usage
> > > > > > > > by
> > > > > > > > > > > state
> > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > >    - Modify how TaskExecutor memory is
> > partitioned
> > > > > > > accounted
> > > > > > > > > > > > individual
> > > > > > > > > > > > > > >    memory reservations and pools.
> > > > > > > > > > > > > > >    - Simplify memory configuration options and
> > > > > > calculations
> > > > > > > > > > logics.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > document
> > > > [1].
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Please note that the early design doc [2] is
> out
> > > of
> > > > > > sync,
> > > > > > > > and
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > > > > appreciated to have the discussion in this
> > mailing
> > > > list
> > > > > > > > > thread.)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Looking forward to your feedbacks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Thanks for sharing your opinion Till.
> > > > > >
> > > > > > I'm also in favor of alternative 2. I was wondering whether we
> can
> > > > avoid
> > > > > > using Unsafe.allocate() for off-heap managed memory and network
> > > memory
> > > > > with
> > > > > > alternative 3. But after giving it a second thought, I think even
> > for
> > > > > > alternative 3 using direct memory for off-heap managed memory
> could
> > > > cause
> > > > > > problems.
> > > > > >
> > > > > > Hi Yang,
> > > > > >
> > > > > > Regarding your concern, I think what proposed in this FLIP it to
> > have
> > > > > both
> > > > > > off-heap managed memory and network memory allocated through
> > > > > > Unsafe.allocate(), which means they are practically native memory
> > and
> > > > not
> > > > > > limited by JVM max direct memory. The only parts of memory
> limited
> > by
> > > > JVM
> > > > > > max direct memory are task off-heap memory and JVM overhead,
> which
> > > are
> > > > > > exactly alternative 2 suggests to set the JVM max direct memory
> to.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for the clarification Xintong. I understand the two
> > > > alternatives
> > > > > > > now.
> > > > > > >
> > > > > > > I would be in favour of option 2 because it makes things
> > explicit.
> > > If
> > > > > we
> > > > > > > don't limit the direct memory, I fear that we might end up in a
> > > > similar
> > > > > > > situation as we are currently in: The user might see that her
> > > process
> > > > > > gets
> > > > > > > killed by the OS and does not know why this is the case.
> > > > Consequently,
> > > > > > she
> > > > > > > tries to decrease the process memory size (similar to
> increasing
> > > the
> > > > > > cutoff
> > > > > > > ratio) in order to accommodate for the extra direct memory.
> Even
> > > > worse,
> > > > > > she
> > > > > > > tries to decrease memory budgets which are not fully used and
> > hence
> > > > > won't
> > > > > > > change the overall memory consumption.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Let me explain this with a concrete example Till.
> > > > > > > >
> > > > > > > > Let's say we have the following scenario.
> > > > > > > >
> > > > > > > > Total Process Memory: 1GB
> > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM Overhead):
> 200MB
> > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace, Off-Heap
> Managed
> > > > Memory
> > > > > > and
> > > > > > > > Network Memory): 800MB
> > > > > > > >
> > > > > > > >
> > > > > > > > For alternative 2, we set -XX:MaxDirectMemorySize to 200MB.
> > > > > > > > For alternative 3, we set -XX:MaxDirectMemorySize to a very
> > large
> > > > > > value,
> > > > > > > > let's say 1TB.
> > > > > > > >
> > > > > > > > If the actual direct memory usage of Task Off-Heap Memory and
> > JVM
> > > > > > > Overhead
> > > > > > > > do not exceed 200MB, then alternative 2 and alternative 3
> > should
> > > > have
> > > > > > the
> > > > > > > > same utility. Setting larger -XX:MaxDirectMemorySize will not
> > > > reduce
> > > > > > the
> > > > > > > > sizes of the other memory pools.
> > > > > > > >
> > > > > > > > If the actual direct memory usage of Task Off-Heap Memory and
> > JVM
> > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > >
> > > > > > > >    - Alternative 2 suffers from frequent OOM. To avoid that,
> > the
> > > > only
> > > > > > > thing
> > > > > > > >    user can do is to modify the configuration and increase
> JVM
> > > > Direct
> > > > > > > > Memory
> > > > > > > >    (Task Off-Heap Memory + JVM Overhead). Let's say that user
> > > > > increases
> > > > > > > JVM
> > > > > > > >    Direct Memory to 250MB, this will reduce the total size of
> > > other
> > > > > > > memory
> > > > > > > >    pools to 750MB, given the total process memory remains
> 1GB.
> > > > > > > >    - For alternative 3, there is no chance of direct OOM.
> There
> > > are
> > > > > > > chances
> > > > > > > >    of exceeding the total process memory limit, but given
> that
> > > the
> > > > > > > process
> > > > > > > > may
> > > > > > > >    not use up all the reserved native memory (Off-Heap
> Managed
> > > > > Memory,
> > > > > > > > Network
> > > > > > > >    Memory, JVM Metaspace), if the actual direct memory usage
> is
> > > > > > slightly
> > > > > > > > above
> > > > > > > >    yet very close to 200MB, user probably do not need to
> change
> > > the
> > > > > > > >    configurations.
> > > > > > > >
> > > > > > > > Therefore, I think from the user's perspective, a feasible
> > > > > > configuration
> > > > > > > > for alternative 2 may lead to lower resource utilization
> > compared
> > > > to
> > > > > > > > alternative 3.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till Rohrmann <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I guess you have to help me understand the difference
> between
> > > > > > > > alternative 2
> > > > > > > > > and 3 wrt to memory under utilization Xintong.
> > > > > > > > >
> > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize to Task
> Off-Heap
> > > > Memory
> > > > > > and
> > > > > > > > JVM
> > > > > > > > > Overhead. Then there is the risk that this size is too low
> > > > > resulting
> > > > > > > in a
> > > > > > > > > lot of garbage collection and potentially an OOM.
> > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize to something
> > larger
> > > > > than
> > > > > > > > > alternative 2. This would of course reduce the sizes of the
> > > other
> > > > > > > memory
> > > > > > > > > types.
> > > > > > > > >
> > > > > > > > > How would alternative 2 now result in an under utilization
> of
> > > > > memory
> > > > > > > > > compared to alternative 3? If alternative 3 strictly sets a
> > > > higher
> > > > > > max
> > > > > > > > > direct memory size and we use only little, then I would
> > expect
> > > > that
> > > > > > > > > alternative 3 results in memory under utilization.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi xintong,till
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Native and Direct Memory
> > > > > > > > > >
> > > > > > > > > > My point is setting a very large max direct memory size
> > when
> > > we
> > > > > do
> > > > > > > not
> > > > > > > > > > differentiate direct and native memory. If the direct
> > > > > > > memory,including
> > > > > > > > > user
> > > > > > > > > > direct memory and framework direct memory,could be
> > calculated
> > > > > > > > > > correctly,then
> > > > > > > > > > i am in favor of setting direct memory with fixed value.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Memory Calculation
> > > > > > > > > >
> > > > > > > > > > I agree with xintong. For Yarn and k8s,we need to check
> the
> > > > > memory
> > > > > > > > > > configurations in client to avoid submitting successfully
> > and
> > > > > > failing
> > > > > > > > in
> > > > > > > > > > the flink master.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > >
> > > > > > > > > > Yang
> > > > > > > > > >
> > > > > > > > > > Xintong Song <[email protected]>于2019年8月13日
> 周二22:07写道：
> > > > > > > > > >
> > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > >
> > > > > > > > > > > About MemorySegment, I think you are right that we
> should
> > > not
> > > > > > > include
> > > > > > > > > > this
> > > > > > > > > > > issue in the scope of this FLIP. This FLIP should
> > > concentrate
> > > > > on
> > > > > > > how
> > > > > > > > to
> > > > > > > > > > > configure memory pools for TaskExecutors, with minimum
> > > > > > involvement
> > > > > > > on
> > > > > > > > > how
> > > > > > > > > > > memory consumers use it.
> > > > > > > > > > >
> > > > > > > > > > > About direct memory, I think alternative 3 may not
> having
> > > the
> > > > > > same
> > > > > > > > over
> > > > > > > > > > > reservation issue that alternative 2 does, but at the
> > cost
> > > of
> > > > > > risk
> > > > > > > of
> > > > > > > > > > over
> > > > > > > > > > > using memory at the container level, which is not good.
> > My
> > > > > point
> > > > > > is
> > > > > > > > > that
> > > > > > > > > > > both "Task Off-Heap Memory" and "JVM Overhead" are not
> > easy
> > > > to
> > > > > > > > config.
> > > > > > > > > > For
> > > > > > > > > > > alternative 2, users might configure them higher than
> > what
> > > > > > actually
> > > > > > > > > > needed,
> > > > > > > > > > > just to avoid getting a direct OOM. For alternative 3,
> > > users
> > > > do
> > > > > > not
> > > > > > > > get
> > > > > > > > > > > direct OOM, so they may not config the two options
> > > > aggressively
> > > > > > > high.
> > > > > > > > > But
> > > > > > > > > > > the consequences are risks of overall container memory
> > > usage
> > > > > > > exceeds
> > > > > > > > > the
> > > > > > > > > > > budget.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till Rohrmann <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for proposing this FLIP Xintong.
> > > > > > > > > > > >
> > > > > > > > > > > > All in all I think it already looks quite good.
> > > Concerning
> > > > > the
> > > > > > > > first
> > > > > > > > > > open
> > > > > > > > > > > > question about allocating memory segments, I was
> > > wondering
> > > > > > > whether
> > > > > > > > > this
> > > > > > > > > > > is
> > > > > > > > > > > > strictly necessary to do in the context of this FLIP
> or
> > > > > whether
> > > > > > > > this
> > > > > > > > > > > could
> > > > > > > > > > > > be done as a follow up? Without knowing all details,
> I
> > > > would
> > > > > be
> > > > > > > > > > concerned
> > > > > > > > > > > > that we would widen the scope of this FLIP too much
> > > because
> > > > > we
> > > > > > > > would
> > > > > > > > > > have
> > > > > > > > > > > > to touch all the existing call sites of the
> > MemoryManager
> > > > > where
> > > > > > > we
> > > > > > > > > > > allocate
> > > > > > > > > > > > memory segments (this should mainly be batch
> > operators).
> > > > The
> > > > > > > > addition
> > > > > > > > > > of
> > > > > > > > > > > > the memory reservation call to the MemoryManager
> should
> > > not
> > > > > be
> > > > > > > > > affected
> > > > > > > > > > > by
> > > > > > > > > > > > this and I would hope that this is the only point of
> > > > > > interaction
> > > > > > > a
> > > > > > > > > > > > streaming job would have with the MemoryManager.
> > > > > > > > > > > >
> > > > > > > > > > > > Concerning the second open question about setting or
> > not
> > > > > > setting
> > > > > > > a
> > > > > > > > > max
> > > > > > > > > > > > direct memory limit, I would also be interested why
> > Yang
> > > > Wang
> > > > > > > > thinks
> > > > > > > > > > > > leaving it open would be best. My concern about this
> > > would
> > > > be
> > > > > > > that
> > > > > > > > we
> > > > > > > > > > > would
> > > > > > > > > > > > be in a similar situation as we are now with the
> > > > > > > > RocksDBStateBackend.
> > > > > > > > > > If
> > > > > > > > > > > > the different memory pools are not clearly separated
> > and
> > > > can
> > > > > > > spill
> > > > > > > > > over
> > > > > > > > > > > to
> > > > > > > > > > > > a different pool, then it is quite hard to understand
> > > what
> > > > > > > exactly
> > > > > > > > > > > causes a
> > > > > > > > > > > > process to get killed for using too much memory. This
> > > could
> > > > > > then
> > > > > > > > > easily
> > > > > > > > > > > > lead to a similar situation what we have with the
> > > > > cutoff-ratio.
> > > > > > > So
> > > > > > > > > why
> > > > > > > > > > > not
> > > > > > > > > > > > setting a sane default value for max direct memory
> and
> > > > giving
> > > > > > the
> > > > > > > > > user
> > > > > > > > > > an
> > > > > > > > > > > > option to increase it if he runs into an OOM.
> > > > > > > > > > > >
> > > > > > > > > > > > @Xintong, how would alternative 2 lead to lower
> memory
> > > > > > > utilization
> > > > > > > > > than
> > > > > > > > > > > > alternative 3 where we set the direct memory to a
> > higher
> > > > > value?
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong Song <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > I think setting a very large max direct memory size
> > > > > > definitely
> > > > > > > > has
> > > > > > > > > > some
> > > > > > > > > > > > > good sides. E.g., we do not worry about direct OOM,
> > and
> > > > we
> > > > > > > don't
> > > > > > > > > even
> > > > > > > > > > > > need
> > > > > > > > > > > > > to allocate managed / network memory with
> > > > > Unsafe.allocate() .
> > > > > > > > > > > > > However, there are also some down sides of doing
> > this.
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - One thing I can think of is that if a task
> > > executor
> > > > > > > > container
> > > > > > > > > is
> > > > > > > > > > > > >    killed due to overusing memory, it could be hard
> > for
> > > > use
> > > > > > to
> > > > > > > > know
> > > > > > > > > > > which
> > > > > > > > > > > > > part
> > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > >    - Another down side is that the JVM never
> trigger
> > GC
> > > > due
> > > > > > to
> > > > > > > > > > reaching
> > > > > > > > > > > > max
> > > > > > > > > > > > >    direct memory limit, because the limit is too
> high
> > > to
> > > > be
> > > > > > > > > reached.
> > > > > > > > > > > That
> > > > > > > > > > > > >    means we kind of relay on heap memory to trigger
> > GC
> > > > and
> > > > > > > > release
> > > > > > > > > > > direct
> > > > > > > > > > > > >    memory. That could be a problem in cases where
> we
> > > have
> > > > > > more
> > > > > > > > > direct
> > > > > > > > > > > > > memory
> > > > > > > > > > > > >    usage but not enough heap activity to trigger
> the
> > > GC.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Maybe you can share your reasons for preferring
> > > setting a
> > > > > > very
> > > > > > > > > large
> > > > > > > > > > > > value,
> > > > > > > > > > > > > if there are anything else I overlooked.
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > If there is any conflict between multiple
> > configuration
> > > > > that
> > > > > > > user
> > > > > > > > > > > > > explicitly specified, I think we should throw an
> > error.
> > > > > > > > > > > > > I think doing checking on the client side is a good
> > > idea,
> > > > > so
> > > > > > > that
> > > > > > > > > on
> > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > K8s we can discover the problem before submitting
> the
> > > > Flink
> > > > > > > > > cluster,
> > > > > > > > > > > > which
> > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > But we can not only rely on the client side
> checking,
> > > > > because
> > > > > > > for
> > > > > > > > > > > > > standalone cluster TaskManagers on different
> machines
> > > may
> > > > > > have
> > > > > > > > > > > different
> > > > > > > > > > > > > configurations and the client does see that.
> > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for your detailed proposal. After all the
> > > memory
> > > > > > > > > > configuration
> > > > > > > > > > > > are
> > > > > > > > > > > > > > introduced, it will be more powerful to control
> the
> > > > flink
> > > > > > > > memory
> > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We do not differentiate user direct memory and
> > native
> > > > > > memory.
> > > > > > > > > They
> > > > > > > > > > > are
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > included in task off-heap memory. Right? So i
> don’t
> > > > think
> > > > > > we
> > > > > > > > > could
> > > > > > > > > > > not
> > > > > > > > > > > > > set
> > > > > > > > > > > > > > the -XX:MaxDirectMemorySize properly. I prefer
> > > leaving
> > > > > it a
> > > > > > > > very
> > > > > > > > > > > large
> > > > > > > > > > > > > > value.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If the sum of and fine-grained memory(network
> > memory,
> > > > > > managed
> > > > > > > > > > memory,
> > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > is larger than total process memory, how do we
> deal
> > > > with
> > > > > > this
> > > > > > > > > > > > situation?
> > > > > > > > > > > > > Do
> > > > > > > > > > > > > > we need to check the memory configuration in
> > client?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song <[email protected]>
> 于2019年8月7日周三
> > > > > > 下午10:14写道：
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We would like to start a discussion thread on
> > > > "FLIP-49:
> > > > > > > > Unified
> > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > Configuration for TaskExecutors"[1], where we
> > > > describe
> > > > > > how
> > > > > > > to
> > > > > > > > > > > improve
> > > > > > > > > > > > > > > TaskExecutor memory configurations. The FLIP
> > > document
> > > > > is
> > > > > > > > mostly
> > > > > > > > > > > based
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > early design "Memory Management and
> Configuration
> > > > > > > > Reloaded"[2]
> > > > > > > > > by
> > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > with updates from follow-up discussions both
> > online
> > > > and
> > > > > > > > > offline.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This FLIP addresses several shortcomings of
> > current
> > > > > > (Flink
> > > > > > > > 1.9)
> > > > > > > > > > > > > > > TaskExecutor memory configuration.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Different configuration for Streaming and
> > > Batch.
> > > > > > > > > > > > > > >    - Complex and difficult configuration of
> > RocksDB
> > > > in
> > > > > > > > > Streaming.
> > > > > > > > > > > > > > >    - Complicated, uncertain and hard to
> > understand.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Key changes to solve the problems can be
> > summarized
> > > > as
> > > > > > > > follows.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Extend memory manager to also account for
> > > memory
> > > > > > usage
> > > > > > > > by
> > > > > > > > > > > state
> > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > >    - Modify how TaskExecutor memory is
> > partitioned
> > > > > > > accounted
> > > > > > > > > > > > individual
> > > > > > > > > > > > > > >    memory reservations and pools.
> > > > > > > > > > > > > > >    - Simplify memory configuration options and
> > > > > > calculations
> > > > > > > > > > logics.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > document
> > > > [1].
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Please note that the early design doc [2] is
> out
> > > of
> > > > > > sync,
> > > > > > > > and
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > > > > appreciated to have the discussion in this
> > mailing
> > > > list
> > > > > > > > > thread.)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Looking forward to your feedbacks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Reply via email to