Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Till Rohrmann Wed, 28 Aug 2019 07:46:17 -0700

Thanks for the clarification. I have some more comments:

- I would actually split the logic to compute the process memory
requirements and storing the values into two things. E.g. one could name
the former TaskExecutorProcessUtility and  the latter
TaskExecutorProcessMemory. But we can discuss this on the PR since it's
just a naming detail.


- Generally, I'm not opposed to making configuration values overridable by
ENV variables. I think this is a very good idea and makes the
configurability of Flink processes easier. However, I think that adding
this functionality should not be part of this FLIP because it would simply
widen the scope unnecessarily.

The reasons why I believe it is unnecessary are the following: For Yarn we
already create write a flink-conf.yaml which could be populated with the
memory settings. For the other processes it should not make a difference
whether the loaded Configuration is populated with the memory settings from
ENV variables or by using TaskExecutorProcessUtility to compute the missing
values from the loaded configuration. If the latter would not be possible
(wrong or missing configuration values), then we should not have been able
to actually start the process in the first place.

- Concerning the memory reservation: I agree with you that we need the
memory reservation functionality to make streaming jobs work with "managed"
memory. However, w/o this functionality the whole Flip would already bring
a good amount of improvements to our users when running batch jobs.
Moreover, by keeping the scope smaller we can complete the FLIP faster.
Hence, I would propose to address the memory reservation functionality as a
follow up FLIP (which Yu is working on if I'm not mistaken).

Cheers,
Till

On Wed, Aug 28, 2019 at 11:43 AM Yang Wang <[email protected]> wrote:

> Just add my 2 cents.
>
> Using environment variables to override the configuration for different
> taskmanagers is better.
> We do not need to generate dedicated flink-conf.yaml for all taskmanagers.
> A common flink-conf.yam and different environment variables are enough.
> By reducing the distributed cached files, it could make launching a
> taskmanager faster.
>
> Stephan gives a good suggestion that we could move the logic into
> "GlobalConfiguration.loadConfig()" method.
> Maybe the client could also benefit from this. Different users do not have
> to export FLINK_CONF_DIR to update few config options.
>
>
> Best,
> Yang
>
> Stephan Ewen <[email protected]> 于2019年8月28日周三 上午1:21写道：
>
> > One note on the Environment Variables and Configuration discussion.
> >
> > My understanding is that passed ENV variables are added to the
> > configuration in the "GlobalConfiguration.loadConfig()" method (or
> > similar).
> > For all the code inside Flink, it looks like the data was in the config
> to
> > start with, just that the scripts that compute the variables can pass the
> > values to the process without actually needing to write a file.
> >
> > For example the "GlobalConfiguration.loadConfig()" method would take any
> > ENV variable prefixed with "flink" and add it as a config key.
> > "flink_taskmanager_memory_size=2g" would become "taskmanager.memory.size:
> > 2g".
> >
> >
> > On Tue, Aug 27, 2019 at 4:05 PM Xintong Song <[email protected]>
> > wrote:
> >
> > > Thanks for the comments, Till.
> > >
> > > I've also seen your comments on the wiki page, but let's keep the
> > > discussion here.
> > >
> > > - Regarding 'TaskExecutorSpecifics', how do you think about naming it
> > > 'TaskExecutorResourceSpecifics'.
> > > - Regarding passing memory configurations into task executors, I'm in
> > favor
> > > of do it via environment variables rather than configurations, with the
> > > following two reasons.
> > >   - It is easier to keep the memory options once calculate not to be
> > > changed with environment variables rather than configurations.
> > >   - I'm not sure whether we should write the configuration in startup
> > > scripts. Writing changes into the configuration files when running the
> > > startup scripts does not sounds right to me. Or we could make a copy of
> > > configuration files per flink cluster, and make the task executor to
> load
> > > from the copy, and clean up the copy after the cluster is shutdown,
> which
> > > is complicated. (I think this is also what Stephan means in his comment
> > on
> > > the wiki page?)
> > > - Regarding reserving memory, I think this change should be included in
> > > this FLIP. I think a big part of motivations of this FLIP is to unify
> > > memory configuration for streaming / batch and make it easy for
> > configuring
> > > rocksdb memory. If we don't support memory reservation, then streaming
> > jobs
> > > cannot use managed memory (neither on-heap or off-heap), which makes
> this
> > > FLIP incomplete.
> > > - Regarding network memory, I think you are right. I think we probably
> > > don't need to change network stack from using direct memory to using
> > unsafe
> > > native memory. Network memory size is deterministic, cannot be reserved
> > as
> > > managed memory does, and cannot be overused. I think it also works if
> we
> > > simply keep using direct memory for network and include it in jvm max
> > > direct memory size.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Aug 27, 2019 at 8:12 PM Till Rohrmann <[email protected]>
> > > wrote:
> > >
> > > > Hi Xintong,
> > > >
> > > > thanks for addressing the comments and adding a more detailed
> > > > implementation plan. I have a couple of comments concerning the
> > > > implementation plan:
> > > >
> > > > - The name `TaskExecutorSpecifics` is not really descriptive.
> Choosing
> > a
> > > > different name could help here.
> > > > - I'm not sure whether I would pass the memory configuration to the
> > > > TaskExecutor via environment variables. I think it would be better to
> > > write
> > > > it into the configuration one uses to start the TM process.
> > > > - If possible, I would exclude the memory reservation from this FLIP
> > and
> > > > add this as part of a dedicated FLIP.
> > > > - If possible, then I would exclude changes to the network stack from
> > > this
> > > > FLIP. Maybe we can simply say that the direct memory needed by the
> > > network
> > > > stack is the framework direct memory requirement. Changing how the
> > memory
> > > > is allocated can happen in a second step. This would keep the scope
> of
> > > this
> > > > FLIP smaller.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Aug 22, 2019 at 2:51 PM Xintong Song <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I just updated the FLIP document on wiki [1], with the following
> > > changes.
> > > > >
> > > > >    - Removed open question regarding MemorySegment allocation. As
> > > > >    discussed, we exclude this topic from the scope of this FLIP.
> > > > >    - Updated content about JVM direct memory parameter according to
> > > > recent
> > > > >    discussions, and moved the other options to "Rejected
> > Alternatives"
> > > > for
> > > > > the
> > > > >    moment.
> > > > >    - Added implementation steps.
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > >
> > > > > On Mon, Aug 19, 2019 at 7:16 PM Stephan Ewen <[email protected]>
> > wrote:
> > > > >
> > > > > > @Xintong: Concerning "wait for memory users before task dispose
> and
> > > > > memory
> > > > > > release": I agree, that's how it should be. Let's try it out.
> > > > > >
> > > > > > @Xintong @Jingsong: Concerning " JVM does not wait for GC when
> > > > allocating
> > > > > > direct memory buffer": There seems to be pretty elaborate logic
> to
> > > free
> > > > > > buffers when allocating new ones. See
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/jdk/file/tip/src/share/classes/java/nio/Bits.java#l643
> > > > > >
> > > > > > @Till: Maybe. If we assume that the JVM default works (like going
> > > with
> > > > > > option 2 and not setting "-XX:MaxDirectMemorySize" at all), then
> I
> > > > think
> > > > > it
> > > > > > should be okay to set "-XX:MaxDirectMemorySize" to
> > > > > > "off_heap_managed_memory + direct_memory" even if we use RocksDB.
> > > That
> > > > > is a
> > > > > > big if, though, I honestly have no idea :D Would be good to
> > > understand
> > > > > > this, though, because this would affect option (2) and option
> > (1.2).
> > > > > >
> > > > > > On Mon, Aug 19, 2019 at 4:44 PM Xintong Song <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for the inputs, Jingsong.
> > > > > > >
> > > > > > > Let me try to summarize your points. Please correct me if I'm
> > > wrong.
> > > > > > >
> > > > > > >    - Memory consumers should always avoid returning memory
> > segments
> > > > to
> > > > > > >    memory manager while there are still un-cleaned structures /
> > > > threads
> > > > > > > that
> > > > > > >    may use the memory. Otherwise, it would cause serious
> problems
> > > by
> > > > > > having
> > > > > > >    multiple consumers trying to use the same memory segment.
> > > > > > >    - JVM does not wait for GC when allocating direct memory
> > buffer.
> > > > > > >    Therefore even we set proper max direct memory size limit,
> we
> > > may
> > > > > > still
> > > > > > >    encounter direct memory oom if the GC cleaning memory slower
> > > than
> > > > > the
> > > > > > >    direct memory allocation.
> > > > > > >
> > > > > > > Am I understanding this correctly?
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Aug 19, 2019 at 4:21 PM JingsongLee <
> > > [email protected]
> > > > > > > .invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi stephan:
> > > > > > > >
> > > > > > > > About option 2:
> > > > > > > >
> > > > > > > > if additional threads not cleanly shut down before we can
> exit
> > > the
> > > > > > task:
> > > > > > > > In the current case of memory reuse, it has freed up the
> memory
> > > it
> > > > > > > >  uses. If this memory is used by other tasks and asynchronous
> > > > threads
> > > > > > > >  of exited task may still be writing, there will be
> concurrent
> > > > > security
> > > > > > > >  problems, and even lead to errors in user computing results.
> > > > > > > >
> > > > > > > > So I think this is a serious and intolerable bug, No matter
> > what
> > > > the
> > > > > > > >  option is, it should be avoided.
> > > > > > > >
> > > > > > > > About direct memory cleaned by GC:
> > > > > > > > I don't think it is a good idea, I've encountered so many
> > > > situations
> > > > > > > >  that it's too late for GC to cause DirectMemory OOM. Release
> > and
> > > > > > > >  allocate DirectMemory depend on the type of user job, which
> is
> > > > > > > >  often beyond our control.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jingsong Lee
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > ------------------------------------------------------------------
> > > > > > > > From:Stephan Ewen <[email protected]>
> > > > > > > > Send Time:2019年8月19日(星期一) 15:56
> > > > > > > > To:dev <[email protected]>
> > > > > > > > Subject:Re: [DISCUSS] FLIP-49: Unified Memory Configuration
> for
> > > > > > > > TaskExecutors
> > > > > > > >
> > > > > > > > My main concern with option 2 (manually release memory) is
> that
> > > > > > segfaults
> > > > > > > > in the JVM send off all sorts of alarms on user ends. So we
> > need
> > > to
> > > > > > > > guarantee that this never happens.
> > > > > > > >
> > > > > > > > The trickyness is in tasks that uses data structures /
> > algorithms
> > > > > with
> > > > > > > > additional threads, like hash table spill/read and sorting
> > > threads.
> > > > > We
> > > > > > > need
> > > > > > > > to ensure that these cleanly shut down before we can exit the
> > > task.
> > > > > > > > I am not sure that we have that guaranteed already, that's
> why
> > > > option
> > > > > > 1.1
> > > > > > > > seemed simpler to me.
> > > > > > > >
> > > > > > > > On Mon, Aug 19, 2019 at 3:42 PM Xintong Song <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the comments, Stephan. Summarized in this way
> > really
> > > > > makes
> > > > > > > > > things easier to understand.
> > > > > > > > >
> > > > > > > > > I'm in favor of option 2, at least for the moment. I think
> it
> > > is
> > > > > not
> > > > > > > that
> > > > > > > > > difficult to keep it segfault safe for memory manager, as
> > long
> > > as
> > > > > we
> > > > > > > > always
> > > > > > > > > de-allocate the memory segment when it is released from the
> > > > memory
> > > > > > > > > consumers. Only if the memory consumer continue using the
> > > buffer
> > > > of
> > > > > > > > memory
> > > > > > > > > segment after releasing it, in which case we do want the
> job
> > to
> > > > > fail
> > > > > > so
> > > > > > > > we
> > > > > > > > > detect the memory leak early.
> > > > > > > > >
> > > > > > > > > For option 1.2, I don't think this is a good idea. Not only
> > > > because
> > > > > > the
> > > > > > > > > assumption (regular GC is enough to clean direct buffers)
> may
> > > not
> > > > > > > always
> > > > > > > > be
> > > > > > > > > true, but also it makes harder for finding problems in
> cases
> > of
> > > > > > memory
> > > > > > > > > overuse. E.g., user configured some direct memory for the
> > user
> > > > > > > libraries.
> > > > > > > > > If the library actually use more direct memory then
> > configured,
> > > > > which
> > > > > > > > > cannot be cleaned by GC because they are still in use, may
> > lead
> > > > to
> > > > > > > > overuse
> > > > > > > > > of the total container memory. In that case, if it didn't
> > touch
> > > > the
> > > > > > JVM
> > > > > > > > > default max direct memory limit, we cannot get a direct
> > memory
> > > > OOM
> > > > > > and
> > > > > > > it
> > > > > > > > > will become super hard to understand which part of the
> > > > > configuration
> > > > > > > need
> > > > > > > > > to be updated.
> > > > > > > > >
> > > > > > > > > For option 1.1, it has the similar problem as 1.2, if the
> > > > exceeded
> > > > > > > direct
> > > > > > > > > memory does not reach the max direct memory limit specified
> > by
> > > > the
> > > > > > > > > dedicated parameter. I think it is slightly better than
> 1.2,
> > > only
> > > > > > > because
> > > > > > > > > we can tune the parameter.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Aug 19, 2019 at 2:53 PM Stephan Ewen <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > About the "-XX:MaxDirectMemorySize" discussion, maybe let
> > me
> > > > > > > summarize
> > > > > > > > > it a
> > > > > > > > > > bit differently:
> > > > > > > > > >
> > > > > > > > > > We have the following two options:
> > > > > > > > > >
> > > > > > > > > > (1) We let MemorySegments be de-allocated by the GC. That
> > > makes
> > > > > it
> > > > > > > > > segfault
> > > > > > > > > > safe. But then we need a way to trigger GC in case
> > > > de-allocation
> > > > > > and
> > > > > > > > > > re-allocation of a bunch of segments happens quickly,
> which
> > > is
> > > > > > often
> > > > > > > > the
> > > > > > > > > > case during batch scheduling or task restart.
> > > > > > > > > >   - The "-XX:MaxDirectMemorySize" (option 1.1) is one way
> > to
> > > do
> > > > > > this
> > > > > > > > > >   - Another way could be to have a dedicated bookkeeping
> in
> > > the
> > > > > > > > > > MemoryManager (option 1.2), so that this is a number
> > > > independent
> > > > > of
> > > > > > > the
> > > > > > > > > > "-XX:MaxDirectMemorySize" parameter.
> > > > > > > > > >
> > > > > > > > > > (2) We manually allocate and de-allocate the memory for
> the
> > > > > > > > > MemorySegments
> > > > > > > > > > (option 2). That way we need not worry about triggering
> GC
> > by
> > > > > some
> > > > > > > > > > threshold or bookkeeping, but it is harder to prevent
> > > > segfaults.
> > > > > We
> > > > > > > > need
> > > > > > > > > to
> > > > > > > > > > be very careful about when we release the memory segments
> > > (only
> > > > > in
> > > > > > > the
> > > > > > > > > > cleanup phase of the main thread).
> > > > > > > > > >
> > > > > > > > > > If we go with option 1.1, we probably need to set
> > > > > > > > > > "-XX:MaxDirectMemorySize" to "off_heap_managed_memory +
> > > > > > > direct_memory"
> > > > > > > > > and
> > > > > > > > > > have "direct_memory" as a separate reserved memory pool.
> > > > Because
> > > > > if
> > > > > > > we
> > > > > > > > > just
> > > > > > > > > > set "-XX:MaxDirectMemorySize" to
> "off_heap_managed_memory +
> > > > > > > > > jvm_overhead",
> > > > > > > > > > then there will be times when that entire memory is
> > allocated
> > > > by
> > > > > > > direct
> > > > > > > > > > buffers and we have nothing left for the JVM overhead. So
> > we
> > > > > either
> > > > > > > > need
> > > > > > > > > a
> > > > > > > > > > way to compensate for that (again some safety margin
> cutoff
> > > > > value)
> > > > > > or
> > > > > > > > we
> > > > > > > > > > will exceed container memory.
> > > > > > > > > >
> > > > > > > > > > If we go with option 1.2, we need to be aware that it
> takes
> > > > > > elaborate
> > > > > > > > > logic
> > > > > > > > > > to push recycling of direct buffers without always
> > > triggering a
> > > > > > full
> > > > > > > > GC.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > My first guess is that the options will be easiest to do
> in
> > > the
> > > > > > > > following
> > > > > > > > > > order:
> > > > > > > > > >
> > > > > > > > > >   - Option 1.1 with a dedicated direct_memory parameter,
> as
> > > > > > discussed
> > > > > > > > > > above. We would need to find a way to set the
> direct_memory
> > > > > > parameter
> > > > > > > > by
> > > > > > > > > > default. We could start with 64 MB and see how it goes in
> > > > > practice.
> > > > > > > One
> > > > > > > > > > danger I see is that setting this loo low can cause a
> bunch
> > > of
> > > > > > > > additional
> > > > > > > > > > GCs compared to before (we need to watch this carefully).
> > > > > > > > > >
> > > > > > > > > >   - Option 2. It is actually quite simple to implement,
> we
> > > > could
> > > > > > try
> > > > > > > > how
> > > > > > > > > > segfault safe we are at the moment.
> > > > > > > > > >
> > > > > > > > > >   - Option 1.2: We would not touch the
> > > > "-XX:MaxDirectMemorySize"
> > > > > > > > > parameter
> > > > > > > > > > at all and assume that all the direct memory allocations
> > that
> > > > the
> > > > > > JVM
> > > > > > > > and
> > > > > > > > > > Netty do are infrequent enough to be cleaned up fast
> enough
> > > > > through
> > > > > > > > > regular
> > > > > > > > > > GC. I am not sure if that is a valid assumption, though.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stephan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for sharing your opinion Till.
> > > > > > > > > > >
> > > > > > > > > > > I'm also in favor of alternative 2. I was wondering
> > whether
> > > > we
> > > > > > can
> > > > > > > > > avoid
> > > > > > > > > > > using Unsafe.allocate() for off-heap managed memory and
> > > > network
> > > > > > > > memory
> > > > > > > > > > with
> > > > > > > > > > > alternative 3. But after giving it a second thought, I
> > > think
> > > > > even
> > > > > > > for
> > > > > > > > > > > alternative 3 using direct memory for off-heap managed
> > > memory
> > > > > > could
> > > > > > > > > cause
> > > > > > > > > > > problems.
> > > > > > > > > > >
> > > > > > > > > > > Hi Yang,
> > > > > > > > > > >
> > > > > > > > > > > Regarding your concern, I think what proposed in this
> > FLIP
> > > it
> > > > > to
> > > > > > > have
> > > > > > > > > > both
> > > > > > > > > > > off-heap managed memory and network memory allocated
> > > through
> > > > > > > > > > > Unsafe.allocate(), which means they are practically
> > native
> > > > > memory
> > > > > > > and
> > > > > > > > > not
> > > > > > > > > > > limited by JVM max direct memory. The only parts of
> > memory
> > > > > > limited
> > > > > > > by
> > > > > > > > > JVM
> > > > > > > > > > > max direct memory are task off-heap memory and JVM
> > > overhead,
> > > > > > which
> > > > > > > > are
> > > > > > > > > > > exactly alternative 2 suggests to set the JVM max
> direct
> > > > memory
> > > > > > to.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the clarification Xintong. I understand
> the
> > > two
> > > > > > > > > alternatives
> > > > > > > > > > > > now.
> > > > > > > > > > > >
> > > > > > > > > > > > I would be in favour of option 2 because it makes
> > things
> > > > > > > explicit.
> > > > > > > > If
> > > > > > > > > > we
> > > > > > > > > > > > don't limit the direct memory, I fear that we might
> end
> > > up
> > > > > in a
> > > > > > > > > similar
> > > > > > > > > > > > situation as we are currently in: The user might see
> > that
> > > > her
> > > > > > > > process
> > > > > > > > > > > gets
> > > > > > > > > > > > killed by the OS and does not know why this is the
> > case.
> > > > > > > > > Consequently,
> > > > > > > > > > > she
> > > > > > > > > > > > tries to decrease the process memory size (similar to
> > > > > > increasing
> > > > > > > > the
> > > > > > > > > > > cutoff
> > > > > > > > > > > > ratio) in order to accommodate for the extra direct
> > > memory.
> > > > > > Even
> > > > > > > > > worse,
> > > > > > > > > > > she
> > > > > > > > > > > > tries to decrease memory budgets which are not fully
> > used
> > > > and
> > > > > > > hence
> > > > > > > > > > won't
> > > > > > > > > > > > change the overall memory consumption.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Let me explain this with a concrete example Till.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's say we have the following scenario.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Total Process Memory: 1GB
> > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM
> > > Overhead):
> > > > > > 200MB
> > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace,
> > Off-Heap
> > > > > > Managed
> > > > > > > > > Memory
> > > > > > > > > > > and
> > > > > > > > > > > > > Network Memory): 800MB
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > For alternative 2, we set -XX:MaxDirectMemorySize
> to
> > > > 200MB.
> > > > > > > > > > > > > For alternative 3, we set -XX:MaxDirectMemorySize
> to
> > a
> > > > very
> > > > > > > large
> > > > > > > > > > > value,
> > > > > > > > > > > > > let's say 1TB.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If the actual direct memory usage of Task Off-Heap
> > > Memory
> > > > > and
> > > > > > > JVM
> > > > > > > > > > > > Overhead
> > > > > > > > > > > > > do not exceed 200MB, then alternative 2 and
> > > alternative 3
> > > > > > > should
> > > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > > same utility. Setting larger
> -XX:MaxDirectMemorySize
> > > will
> > > > > not
> > > > > > > > > reduce
> > > > > > > > > > > the
> > > > > > > > > > > > > sizes of the other memory pools.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If the actual direct memory usage of Task Off-Heap
> > > Memory
> > > > > and
> > > > > > > JVM
> > > > > > > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - Alternative 2 suffers from frequent OOM. To
> > avoid
> > > > > that,
> > > > > > > the
> > > > > > > > > only
> > > > > > > > > > > > thing
> > > > > > > > > > > > >    user can do is to modify the configuration and
> > > > increase
> > > > > > JVM
> > > > > > > > > Direct
> > > > > > > > > > > > > Memory
> > > > > > > > > > > > >    (Task Off-Heap Memory + JVM Overhead). Let's say
> > > that
> > > > > user
> > > > > > > > > > increases
> > > > > > > > > > > > JVM
> > > > > > > > > > > > >    Direct Memory to 250MB, this will reduce the
> total
> > > > size
> > > > > of
> > > > > > > > other
> > > > > > > > > > > > memory
> > > > > > > > > > > > >    pools to 750MB, given the total process memory
> > > remains
> > > > > > 1GB.
> > > > > > > > > > > > >    - For alternative 3, there is no chance of
> direct
> > > OOM.
> > > > > > There
> > > > > > > > are
> > > > > > > > > > > > chances
> > > > > > > > > > > > >    of exceeding the total process memory limit, but
> > > given
> > > > > > that
> > > > > > > > the
> > > > > > > > > > > > process
> > > > > > > > > > > > > may
> > > > > > > > > > > > >    not use up all the reserved native memory
> > (Off-Heap
> > > > > > Managed
> > > > > > > > > > Memory,
> > > > > > > > > > > > > Network
> > > > > > > > > > > > >    Memory, JVM Metaspace), if the actual direct
> > memory
> > > > > usage
> > > > > > is
> > > > > > > > > > > slightly
> > > > > > > > > > > > > above
> > > > > > > > > > > > >    yet very close to 200MB, user probably do not
> need
> > > to
> > > > > > change
> > > > > > > > the
> > > > > > > > > > > > >    configurations.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Therefore, I think from the user's perspective, a
> > > > feasible
> > > > > > > > > > > configuration
> > > > > > > > > > > > > for alternative 2 may lead to lower resource
> > > utilization
> > > > > > > compared
> > > > > > > > > to
> > > > > > > > > > > > > alternative 3.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till Rohrmann <
> > > > > > > > > [email protected]
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess you have to help me understand the
> > difference
> > > > > > between
> > > > > > > > > > > > > alternative 2
> > > > > > > > > > > > > > and 3 wrt to memory under utilization Xintong.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize to
> Task
> > > > > > Off-Heap
> > > > > > > > > Memory
> > > > > > > > > > > and
> > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > Overhead. Then there is the risk that this size
> is
> > > too
> > > > > low
> > > > > > > > > > resulting
> > > > > > > > > > > > in a
> > > > > > > > > > > > > > lot of garbage collection and potentially an OOM.
> > > > > > > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize to
> > > > something
> > > > > > > larger
> > > > > > > > > > than
> > > > > > > > > > > > > > alternative 2. This would of course reduce the
> > sizes
> > > of
> > > > > the
> > > > > > > > other
> > > > > > > > > > > > memory
> > > > > > > > > > > > > > types.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > How would alternative 2 now result in an under
> > > > > utilization
> > > > > > of
> > > > > > > > > > memory
> > > > > > > > > > > > > > compared to alternative 3? If alternative 3
> > strictly
> > > > > sets a
> > > > > > > > > higher
> > > > > > > > > > > max
> > > > > > > > > > > > > > direct memory size and we use only little, then I
> > > would
> > > > > > > expect
> > > > > > > > > that
> > > > > > > > > > > > > > alternative 3 results in memory under
> utilization.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi xintong,till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Native and Direct Memory
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > My point is setting a very large max direct
> > memory
> > > > size
> > > > > > > when
> > > > > > > > we
> > > > > > > > > > do
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > differentiate direct and native memory. If the
> > > direct
> > > > > > > > > > > > memory,including
> > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > direct memory and framework direct memory,could
> > be
> > > > > > > calculated
> > > > > > > > > > > > > > > correctly,then
> > > > > > > > > > > > > > > i am in favor of setting direct memory with
> fixed
> > > > > value.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Memory Calculation
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I agree with xintong. For Yarn and k8s,we need
> to
> > > > check
> > > > > > the
> > > > > > > > > > memory
> > > > > > > > > > > > > > > configurations in client to avoid submitting
> > > > > successfully
> > > > > > > and
> > > > > > > > > > > failing
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the flink master.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song <[email protected]
> >于2019年8月13日
> > > > > > 周二22:07写道：
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > About MemorySegment, I think you are right
> that
> > > we
> > > > > > should
> > > > > > > > not
> > > > > > > > > > > > include
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > issue in the scope of this FLIP. This FLIP
> > should
> > > > > > > > concentrate
> > > > > > > > > > on
> > > > > > > > > > > > how
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > configure memory pools for TaskExecutors,
> with
> > > > > minimum
> > > > > > > > > > > involvement
> > > > > > > > > > > > on
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > memory consumers use it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > About direct memory, I think alternative 3
> may
> > > not
> > > > > > having
> > > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > reservation issue that alternative 2 does,
> but
> > at
> > > > the
> > > > > > > cost
> > > > > > > > of
> > > > > > > > > > > risk
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > using memory at the container level, which is
> > not
> > > > > good.
> > > > > > > My
> > > > > > > > > > point
> > > > > > > > > > > is
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and "JVM
> Overhead"
> > > are
> > > > > not
> > > > > > > easy
> > > > > > > > > to
> > > > > > > > > > > > > config.
> > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > alternative 2, users might configure them
> > higher
> > > > than
> > > > > > > what
> > > > > > > > > > > actually
> > > > > > > > > > > > > > > needed,
> > > > > > > > > > > > > > > > just to avoid getting a direct OOM. For
> > > alternative
> > > > > 3,
> > > > > > > > users
> > > > > > > > > do
> > > > > > > > > > > not
> > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > direct OOM, so they may not config the two
> > > options
> > > > > > > > > aggressively
> > > > > > > > > > > > high.
> > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > the consequences are risks of overall
> container
> > > > > memory
> > > > > > > > usage
> > > > > > > > > > > > exceeds
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > budget.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till
> Rohrmann <
> > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for proposing this FLIP Xintong.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > All in all I think it already looks quite
> > good.
> > > > > > > > Concerning
> > > > > > > > > > the
> > > > > > > > > > > > > first
> > > > > > > > > > > > > > > open
> > > > > > > > > > > > > > > > > question about allocating memory segments,
> I
> > > was
> > > > > > > > wondering
> > > > > > > > > > > > whether
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > strictly necessary to do in the context of
> > this
> > > > > FLIP
> > > > > > or
> > > > > > > > > > whether
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > be done as a follow up? Without knowing all
> > > > > details,
> > > > > > I
> > > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > > > > > concerned
> > > > > > > > > > > > > > > > > that we would widen the scope of this FLIP
> > too
> > > > much
> > > > > > > > because
> > > > > > > > > > we
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > to touch all the existing call sites of the
> > > > > > > MemoryManager
> > > > > > > > > > where
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > > allocate
> > > > > > > > > > > > > > > > > memory segments (this should mainly be
> batch
> > > > > > > operators).
> > > > > > > > > The
> > > > > > > > > > > > > addition
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the memory reservation call to the
> > > MemoryManager
> > > > > > should
> > > > > > > > not
> > > > > > > > > > be
> > > > > > > > > > > > > > affected
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > this and I would hope that this is the only
> > > point
> > > > > of
> > > > > > > > > > > interaction
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > streaming job would have with the
> > > MemoryManager.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Concerning the second open question about
> > > setting
> > > > > or
> > > > > > > not
> > > > > > > > > > > setting
> > > > > > > > > > > > a
> > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > direct memory limit, I would also be
> > interested
> > > > why
> > > > > > > Yang
> > > > > > > > > Wang
> > > > > > > > > > > > > thinks
> > > > > > > > > > > > > > > > > leaving it open would be best. My concern
> > about
> > > > > this
> > > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > > that
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > be in a similar situation as we are now
> with
> > > the
> > > > > > > > > > > > > RocksDBStateBackend.
> > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > the different memory pools are not clearly
> > > > > separated
> > > > > > > and
> > > > > > > > > can
> > > > > > > > > > > > spill
> > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > a different pool, then it is quite hard to
> > > > > understand
> > > > > > > > what
> > > > > > > > > > > > exactly
> > > > > > > > > > > > > > > > causes a
> > > > > > > > > > > > > > > > > process to get killed for using too much
> > > memory.
> > > > > This
> > > > > > > > could
> > > > > > > > > > > then
> > > > > > > > > > > > > > easily
> > > > > > > > > > > > > > > > > lead to a similar situation what we have
> with
> > > the
> > > > > > > > > > cutoff-ratio.
> > > > > > > > > > > > So
> > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > setting a sane default value for max direct
> > > > memory
> > > > > > and
> > > > > > > > > giving
> > > > > > > > > > > the
> > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > option to increase it if he runs into an
> OOM.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Xintong, how would alternative 2 lead to
> > lower
> > > > > > memory
> > > > > > > > > > > > utilization
> > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > alternative 3 where we set the direct
> memory
> > > to a
> > > > > > > higher
> > > > > > > > > > value?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong
> Song <
> > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > > > > > > I think setting a very large max direct
> > > memory
> > > > > size
> > > > > > > > > > > definitely
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > good sides. E.g., we do not worry about
> > > direct
> > > > > OOM,
> > > > > > > and
> > > > > > > > > we
> > > > > > > > > > > > don't
> > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > to allocate managed / network memory with
> > > > > > > > > > Unsafe.allocate() .
> > > > > > > > > > > > > > > > > > However, there are also some down sides
> of
> > > > doing
> > > > > > > this.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >    - One thing I can think of is that if
> a
> > > task
> > > > > > > > executor
> > > > > > > > > > > > > container
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > >    killed due to overusing memory, it
> could
> > > be
> > > > > hard
> > > > > > > for
> > > > > > > > > use
> > > > > > > > > > > to
> > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > part
> > > > > > > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > > > > > > >    - Another down side is that the JVM
> > never
> > > > > > trigger
> > > > > > > GC
> > > > > > > > > due
> > > > > > > > > > > to
> > > > > > > > > > > > > > > reaching
> > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > >    direct memory limit, because the limit
> > is
> > > > too
> > > > > > high
> > > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > > > > reached.
> > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > >    means we kind of relay on heap memory
> to
> > > > > trigger
> > > > > > > GC
> > > > > > > > > and
> > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > >    memory. That could be a problem in
> cases
> > > > where
> > > > > > we
> > > > > > > > have
> > > > > > > > > > > more
> > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > >    usage but not enough heap activity to
> > > > trigger
> > > > > > the
> > > > > > > > GC.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Maybe you can share your reasons for
> > > preferring
> > > > > > > > setting a
> > > > > > > > > > > very
> > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > > > if there are anything else I overlooked.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > > > > > > If there is any conflict between multiple
> > > > > > > configuration
> > > > > > > > > > that
> > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > explicitly specified, I think we should
> > throw
> > > > an
> > > > > > > error.
> > > > > > > > > > > > > > > > > > I think doing checking on the client side
> > is
> > > a
> > > > > good
> > > > > > > > idea,
> > > > > > > > > > so
> > > > > > > > > > > > that
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > > > > > > K8s we can discover the problem before
> > > > submitting
> > > > > > the
> > > > > > > > > Flink
> > > > > > > > > > > > > > cluster,
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > > > > > > But we can not only rely on the client
> side
> > > > > > checking,
> > > > > > > > > > because
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > standalone cluster TaskManagers on
> > different
> > > > > > machines
> > > > > > > > may
> > > > > > > > > > > have
> > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > configurations and the client does see
> > that.
> > > > > > > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang
> <
> > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for your detailed proposal.
> After
> > > all
> > > > > the
> > > > > > > > memory
> > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > introduced, it will be more powerful to
> > > > control
> > > > > > the
> > > > > > > > > flink
> > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > We do not differentiate user direct
> > memory
> > > > and
> > > > > > > native
> > > > > > > > > > > memory.
> > > > > > > > > > > > > > They
> > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > included in task off-heap memory.
> Right?
> > > So i
> > > > > > don’t
> > > > > > > > > think
> > > > > > > > > > > we
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize properly. I
> > > > prefer
> > > > > > > > leaving
> > > > > > > > > > it a
> > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > If the sum of and fine-grained
> > > memory(network
> > > > > > > memory,
> > > > > > > > > > > managed
> > > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > > > > > is larger than total process memory,
> how
> > do
> > > > we
> > > > > > deal
> > > > > > > > > with
> > > > > > > > > > > this
> > > > > > > > > > > > > > > > > situation?
> > > > > > > > > > > > > > > > > > Do
> > > > > > > > > > > > > > > > > > > we need to check the memory
> configuration
> > > in
> > > > > > > client?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song <[email protected]>
> > > > > > 于2019年8月7日周三
> > > > > > > > > > > 下午10:14写道：
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > We would like to start a discussion
> > > thread
> > > > on
> > > > > > > > > "FLIP-49:
> > > > > > > > > > > > > Unified
> > > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > > > > > > Configuration for TaskExecutors"[1],
> > > where
> > > > we
> > > > > > > > > describe
> > > > > > > > > > > how
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > TaskExecutor memory configurations.
> The
> > > > FLIP
> > > > > > > > document
> > > > > > > > > > is
> > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > based
> > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > early design "Memory Management and
> > > > > > Configuration
> > > > > > > > > > > > > Reloaded"[2]
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > > > > > > with updates from follow-up
> discussions
> > > > both
> > > > > > > online
> > > > > > > > > and
> > > > > > > > > > > > > > offline.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This FLIP addresses several
> > shortcomings
> > > of
> > > > > > > current
> > > > > > > > > > > (Flink
> > > > > > > > > > > > > 1.9)
> > > > > > > > > > > > > > > > > > > > TaskExecutor memory configuration.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - Different configuration for
> > > Streaming
> > > > > and
> > > > > > > > Batch.
> > > > > > > > > > > > > > > > > > > >    - Complex and difficult
> > configuration
> > > of
> > > > > > > RocksDB
> > > > > > > > > in
> > > > > > > > > > > > > > Streaming.
> > > > > > > > > > > > > > > > > > > >    - Complicated, uncertain and hard
> to
> > > > > > > understand.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Key changes to solve the problems can
> > be
> > > > > > > summarized
> > > > > > > > > as
> > > > > > > > > > > > > follows.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - Extend memory manager to also
> > > account
> > > > > for
> > > > > > > > memory
> > > > > > > > > > > usage
> > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > > > > > > >    - Modify how TaskExecutor memory
> is
> > > > > > > partitioned
> > > > > > > > > > > > accounted
> > > > > > > > > > > > > > > > > individual
> > > > > > > > > > > > > > > > > > > >    memory reservations and pools.
> > > > > > > > > > > > > > > > > > > >    - Simplify memory configuration
> > > options
> > > > > and
> > > > > > > > > > > calculations
> > > > > > > > > > > > > > > logics.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Please find more details in the FLIP
> > wiki
> > > > > > > document
> > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > (Please note that the early design
> doc
> > > [2]
> > > > is
> > > > > > out
> > > > > > > > of
> > > > > > > > > > > sync,
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > appreciated to have the discussion in
> > > this
> > > > > > > mailing
> > > > > > > > > list
> > > > > > > > > > > > > > thread.)
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Looking forward to your feedbacks.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for sharing your opinion Till.
> > > > > > > > > > >
> > > > > > > > > > > I'm also in favor of alternative 2. I was wondering
> > whether
> > > > we
> > > > > > can
> > > > > > > > > avoid
> > > > > > > > > > > using Unsafe.allocate() for off-heap managed memory and
> > > > network
> > > > > > > > memory
> > > > > > > > > > with
> > > > > > > > > > > alternative 3. But after giving it a second thought, I
> > > think
> > > > > even
> > > > > > > for
> > > > > > > > > > > alternative 3 using direct memory for off-heap managed
> > > memory
> > > > > > could
> > > > > > > > > cause
> > > > > > > > > > > problems.
> > > > > > > > > > >
> > > > > > > > > > > Hi Yang,
> > > > > > > > > > >
> > > > > > > > > > > Regarding your concern, I think what proposed in this
> > FLIP
> > > it
> > > > > to
> > > > > > > have
> > > > > > > > > > both
> > > > > > > > > > > off-heap managed memory and network memory allocated
> > > through
> > > > > > > > > > > Unsafe.allocate(), which means they are practically
> > native
> > > > > memory
> > > > > > > and
> > > > > > > > > not
> > > > > > > > > > > limited by JVM max direct memory. The only parts of
> > memory
> > > > > > limited
> > > > > > > by
> > > > > > > > > JVM
> > > > > > > > > > > max direct memory are task off-heap memory and JVM
> > > overhead,
> > > > > > which
> > > > > > > > are
> > > > > > > > > > > exactly alternative 2 suggests to set the JVM max
> direct
> > > > memory
> > > > > > to.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the clarification Xintong. I understand
> the
> > > two
> > > > > > > > > alternatives
> > > > > > > > > > > > now.
> > > > > > > > > > > >
> > > > > > > > > > > > I would be in favour of option 2 because it makes
> > things
> > > > > > > explicit.
> > > > > > > > If
> > > > > > > > > > we
> > > > > > > > > > > > don't limit the direct memory, I fear that we might
> end
> > > up
> > > > > in a
> > > > > > > > > similar
> > > > > > > > > > > > situation as we are currently in: The user might see
> > that
> > > > her
> > > > > > > > process
> > > > > > > > > > > gets
> > > > > > > > > > > > killed by the OS and does not know why this is the
> > case.
> > > > > > > > > Consequently,
> > > > > > > > > > > she
> > > > > > > > > > > > tries to decrease the process memory size (similar to
> > > > > > increasing
> > > > > > > > the
> > > > > > > > > > > cutoff
> > > > > > > > > > > > ratio) in order to accommodate for the extra direct
> > > memory.
> > > > > > Even
> > > > > > > > > worse,
> > > > > > > > > > > she
> > > > > > > > > > > > tries to decrease memory budgets which are not fully
> > used
> > > > and
> > > > > > > hence
> > > > > > > > > > won't
> > > > > > > > > > > > change the overall memory consumption.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Let me explain this with a concrete example Till.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's say we have the following scenario.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Total Process Memory: 1GB
> > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM
> > > Overhead):
> > > > > > 200MB
> > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace,
> > Off-Heap
> > > > > > Managed
> > > > > > > > > Memory
> > > > > > > > > > > and
> > > > > > > > > > > > > Network Memory): 800MB
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > For alternative 2, we set -XX:MaxDirectMemorySize
> to
> > > > 200MB.
> > > > > > > > > > > > > For alternative 3, we set -XX:MaxDirectMemorySize
> to
> > a
> > > > very
> > > > > > > large
> > > > > > > > > > > value,
> > > > > > > > > > > > > let's say 1TB.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If the actual direct memory usage of Task Off-Heap
> > > Memory
> > > > > and
> > > > > > > JVM
> > > > > > > > > > > > Overhead
> > > > > > > > > > > > > do not exceed 200MB, then alternative 2 and
> > > alternative 3
> > > > > > > should
> > > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > > same utility. Setting larger
> -XX:MaxDirectMemorySize
> > > will
> > > > > not
> > > > > > > > > reduce
> > > > > > > > > > > the
> > > > > > > > > > > > > sizes of the other memory pools.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If the actual direct memory usage of Task Off-Heap
> > > Memory
> > > > > and
> > > > > > > JVM
> > > > > > > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - Alternative 2 suffers from frequent OOM. To
> > avoid
> > > > > that,
> > > > > > > the
> > > > > > > > > only
> > > > > > > > > > > > thing
> > > > > > > > > > > > >    user can do is to modify the configuration and
> > > > increase
> > > > > > JVM
> > > > > > > > > Direct
> > > > > > > > > > > > > Memory
> > > > > > > > > > > > >    (Task Off-Heap Memory + JVM Overhead). Let's say
> > > that
> > > > > user
> > > > > > > > > > increases
> > > > > > > > > > > > JVM
> > > > > > > > > > > > >    Direct Memory to 250MB, this will reduce the
> total
> > > > size
> > > > > of
> > > > > > > > other
> > > > > > > > > > > > memory
> > > > > > > > > > > > >    pools to 750MB, given the total process memory
> > > remains
> > > > > > 1GB.
> > > > > > > > > > > > >    - For alternative 3, there is no chance of
> direct
> > > OOM.
> > > > > > There
> > > > > > > > are
> > > > > > > > > > > > chances
> > > > > > > > > > > > >    of exceeding the total process memory limit, but
> > > given
> > > > > > that
> > > > > > > > the
> > > > > > > > > > > > process
> > > > > > > > > > > > > may
> > > > > > > > > > > > >    not use up all the reserved native memory
> > (Off-Heap
> > > > > > Managed
> > > > > > > > > > Memory,
> > > > > > > > > > > > > Network
> > > > > > > > > > > > >    Memory, JVM Metaspace), if the actual direct
> > memory
> > > > > usage
> > > > > > is
> > > > > > > > > > > slightly
> > > > > > > > > > > > > above
> > > > > > > > > > > > >    yet very close to 200MB, user probably do not
> need
> > > to
> > > > > > change
> > > > > > > > the
> > > > > > > > > > > > >    configurations.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Therefore, I think from the user's perspective, a
> > > > feasible
> > > > > > > > > > > configuration
> > > > > > > > > > > > > for alternative 2 may lead to lower resource
> > > utilization
> > > > > > > compared
> > > > > > > > > to
> > > > > > > > > > > > > alternative 3.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till Rohrmann <
> > > > > > > > > [email protected]
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess you have to help me understand the
> > difference
> > > > > > between
> > > > > > > > > > > > > alternative 2
> > > > > > > > > > > > > > and 3 wrt to memory under utilization Xintong.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize to
> Task
> > > > > > Off-Heap
> > > > > > > > > Memory
> > > > > > > > > > > and
> > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > Overhead. Then there is the risk that this size
> is
> > > too
> > > > > low
> > > > > > > > > > resulting
> > > > > > > > > > > > in a
> > > > > > > > > > > > > > lot of garbage collection and potentially an OOM.
> > > > > > > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize to
> > > > something
> > > > > > > larger
> > > > > > > > > > than
> > > > > > > > > > > > > > alternative 2. This would of course reduce the
> > sizes
> > > of
> > > > > the
> > > > > > > > other
> > > > > > > > > > > > memory
> > > > > > > > > > > > > > types.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > How would alternative 2 now result in an under
> > > > > utilization
> > > > > > of
> > > > > > > > > > memory
> > > > > > > > > > > > > > compared to alternative 3? If alternative 3
> > strictly
> > > > > sets a
> > > > > > > > > higher
> > > > > > > > > > > max
> > > > > > > > > > > > > > direct memory size and we use only little, then I
> > > would
> > > > > > > expect
> > > > > > > > > that
> > > > > > > > > > > > > > alternative 3 results in memory under
> utilization.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi xintong,till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Native and Direct Memory
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > My point is setting a very large max direct
> > memory
> > > > size
> > > > > > > when
> > > > > > > > we
> > > > > > > > > > do
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > differentiate direct and native memory. If the
> > > direct
> > > > > > > > > > > > memory,including
> > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > direct memory and framework direct memory,could
> > be
> > > > > > > calculated
> > > > > > > > > > > > > > > correctly,then
> > > > > > > > > > > > > > > i am in favor of setting direct memory with
> fixed
> > > > > value.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Memory Calculation
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I agree with xintong. For Yarn and k8s,we need
> to
> > > > check
> > > > > > the
> > > > > > > > > > memory
> > > > > > > > > > > > > > > configurations in client to avoid submitting
> > > > > successfully
> > > > > > > and
> > > > > > > > > > > failing
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the flink master.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song <[email protected]
> >于2019年8月13日
> > > > > > 周二22:07写道：
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > About MemorySegment, I think you are right
> that
> > > we
> > > > > > should
> > > > > > > > not
> > > > > > > > > > > > include
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > issue in the scope of this FLIP. This FLIP
> > should
> > > > > > > > concentrate
> > > > > > > > > > on
> > > > > > > > > > > > how
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > configure memory pools for TaskExecutors,
> with
> > > > > minimum
> > > > > > > > > > > involvement
> > > > > > > > > > > > on
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > memory consumers use it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > About direct memory, I think alternative 3
> may
> > > not
> > > > > > having
> > > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > reservation issue that alternative 2 does,
> but
> > at
> > > > the
> > > > > > > cost
> > > > > > > > of
> > > > > > > > > > > risk
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > using memory at the container level, which is
> > not
> > > > > good.
> > > > > > > My
> > > > > > > > > > point
> > > > > > > > > > > is
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and "JVM
> Overhead"
> > > are
> > > > > not
> > > > > > > easy
> > > > > > > > > to
> > > > > > > > > > > > > config.
> > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > alternative 2, users might configure them
> > higher
> > > > than
> > > > > > > what
> > > > > > > > > > > actually
> > > > > > > > > > > > > > > needed,
> > > > > > > > > > > > > > > > just to avoid getting a direct OOM. For
> > > alternative
> > > > > 3,
> > > > > > > > users
> > > > > > > > > do
> > > > > > > > > > > not
> > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > direct OOM, so they may not config the two
> > > options
> > > > > > > > > aggressively
> > > > > > > > > > > > high.
> > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > the consequences are risks of overall
> container
> > > > > memory
> > > > > > > > usage
> > > > > > > > > > > > exceeds
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > budget.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till
> Rohrmann <
> > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for proposing this FLIP Xintong.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > All in all I think it already looks quite
> > good.
> > > > > > > > Concerning
> > > > > > > > > > the
> > > > > > > > > > > > > first
> > > > > > > > > > > > > > > open
> > > > > > > > > > > > > > > > > question about allocating memory segments,
> I
> > > was
> > > > > > > > wondering
> > > > > > > > > > > > whether
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > strictly necessary to do in the context of
> > this
> > > > > FLIP
> > > > > > or
> > > > > > > > > > whether
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > be done as a follow up? Without knowing all
> > > > > details,
> > > > > > I
> > > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > > > > > concerned
> > > > > > > > > > > > > > > > > that we would widen the scope of this FLIP
> > too
> > > > much
> > > > > > > > because
> > > > > > > > > > we
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > to touch all the existing call sites of the
> > > > > > > MemoryManager
> > > > > > > > > > where
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > > allocate
> > > > > > > > > > > > > > > > > memory segments (this should mainly be
> batch
> > > > > > > operators).
> > > > > > > > > The
> > > > > > > > > > > > > addition
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the memory reservation call to the
> > > MemoryManager
> > > > > > should
> > > > > > > > not
> > > > > > > > > > be
> > > > > > > > > > > > > > affected
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > this and I would hope that this is the only
> > > point
> > > > > of
> > > > > > > > > > > interaction
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > streaming job would have with the
> > > MemoryManager.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Concerning the second open question about
> > > setting
> > > > > or
> > > > > > > not
> > > > > > > > > > > setting
> > > > > > > > > > > > a
> > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > direct memory limit, I would also be
> > interested
> > > > why
> > > > > > > Yang
> > > > > > > > > Wang
> > > > > > > > > > > > > thinks
> > > > > > > > > > > > > > > > > leaving it open would be best. My concern
> > about
> > > > > this
> > > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > > that
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > be in a similar situation as we are now
> with
> > > the
> > > > > > > > > > > > > RocksDBStateBackend.
> > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > the different memory pools are not clearly
> > > > > separated
> > > > > > > and
> > > > > > > > > can
> > > > > > > > > > > > spill
> > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > a different pool, then it is quite hard to
> > > > > understand
> > > > > > > > what
> > > > > > > > > > > > exactly
> > > > > > > > > > > > > > > > causes a
> > > > > > > > > > > > > > > > > process to get killed for using too much
> > > memory.
> > > > > This
> > > > > > > > could
> > > > > > > > > > > then
> > > > > > > > > > > > > > easily
> > > > > > > > > > > > > > > > > lead to a similar situation what we have
> with
> > > the
> > > > > > > > > > cutoff-ratio.
> > > > > > > > > > > > So
> > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > setting a sane default value for max direct
> > > > memory
> > > > > > and
> > > > > > > > > giving
> > > > > > > > > > > the
> > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > option to increase it if he runs into an
> OOM.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Xintong, how would alternative 2 lead to
> > lower
> > > > > > memory
> > > > > > > > > > > > utilization
> > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > alternative 3 where we set the direct
> memory
> > > to a
> > > > > > > higher
> > > > > > > > > > value?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong
> Song <
> > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > > > > > > I think setting a very large max direct
> > > memory
> > > > > size
> > > > > > > > > > > definitely
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > good sides. E.g., we do not worry about
> > > direct
> > > > > OOM,
> > > > > > > and
> > > > > > > > > we
> > > > > > > > > > > > don't
> > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > to allocate managed / network memory with
> > > > > > > > > > Unsafe.allocate() .
> > > > > > > > > > > > > > > > > > However, there are also some down sides
> of
> > > > doing
> > > > > > > this.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >    - One thing I can think of is that if
> a
> > > task
> > > > > > > > executor
> > > > > > > > > > > > > container
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > >    killed due to overusing memory, it
> could
> > > be
> > > > > hard
> > > > > > > for
> > > > > > > > > use
> > > > > > > > > > > to
> > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > part
> > > > > > > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > > > > > > >    - Another down side is that the JVM
> > never
> > > > > > trigger
> > > > > > > GC
> > > > > > > > > due
> > > > > > > > > > > to
> > > > > > > > > > > > > > > reaching
> > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > >    direct memory limit, because the limit
> > is
> > > > too
> > > > > > high
> > > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > > > > reached.
> > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > >    means we kind of relay on heap memory
> to
> > > > > trigger
> > > > > > > GC
> > > > > > > > > and
> > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > >    memory. That could be a problem in
> cases
> > > > where
> > > > > > we
> > > > > > > > have
> > > > > > > > > > > more
> > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > >    usage but not enough heap activity to
> > > > trigger
> > > > > > the
> > > > > > > > GC.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Maybe you can share your reasons for
> > > preferring
> > > > > > > > setting a
> > > > > > > > > > > very
> > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > > > if there are anything else I overlooked.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > > > > > > If there is any conflict between multiple
> > > > > > > configuration
> > > > > > > > > > that
> > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > explicitly specified, I think we should
> > throw
> > > > an
> > > > > > > error.
> > > > > > > > > > > > > > > > > > I think doing checking on the client side
> > is
> > > a
> > > > > good
> > > > > > > > idea,
> > > > > > > > > > so
> > > > > > > > > > > > that
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > > > > > > K8s we can discover the problem before
> > > > submitting
> > > > > > the
> > > > > > > > > Flink
> > > > > > > > > > > > > > cluster,
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > > > > > > But we can not only rely on the client
> side
> > > > > > checking,
> > > > > > > > > > because
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > standalone cluster TaskManagers on
> > different
> > > > > > machines
> > > > > > > > may
> > > > > > > > > > > have
> > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > configurations and the client does see
> > that.
> > > > > > > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang
> <
> > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for your detailed proposal.
> After
> > > all
> > > > > the
> > > > > > > > memory
> > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > introduced, it will be more powerful to
> > > > control
> > > > > > the
> > > > > > > > > flink
> > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > We do not differentiate user direct
> > memory
> > > > and
> > > > > > > native
> > > > > > > > > > > memory.
> > > > > > > > > > > > > > They
> > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > included in task off-heap memory.
> Right?
> > > So i
> > > > > > don’t
> > > > > > > > > think
> > > > > > > > > > > we
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize properly. I
> > > > prefer
> > > > > > > > leaving
> > > > > > > > > > it a
> > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > If the sum of and fine-grained
> > > memory(network
> > > > > > > memory,
> > > > > > > > > > > managed
> > > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > > > > > is larger than total process memory,
> how
> > do
> > > > we
> > > > > > deal
> > > > > > > > > with
> > > > > > > > > > > this
> > > > > > > > > > > > > > > > > situation?
> > > > > > > > > > > > > > > > > > Do
> > > > > > > > > > > > > > > > > > > we need to check the memory
> configuration
> > > in
> > > > > > > client?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song <[email protected]>
> > > > > > 于2019年8月7日周三
> > > > > > > > > > > 下午10:14写道：
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > We would like to start a discussion
> > > thread
> > > > on
> > > > > > > > > "FLIP-49:
> > > > > > > > > > > > > Unified
> > > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > > > > > > Configuration for TaskExecutors"[1],
> > > where
> > > > we
> > > > > > > > > describe
> > > > > > > > > > > how
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > TaskExecutor memory configurations.
> The
> > > > FLIP
> > > > > > > > document
> > > > > > > > > > is
> > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > based
> > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > early design "Memory Management and
> > > > > > Configuration
> > > > > > > > > > > > > Reloaded"[2]
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > > > > > > with updates from follow-up
> discussions
> > > > both
> > > > > > > online
> > > > > > > > > and
> > > > > > > > > > > > > > offline.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This FLIP addresses several
> > shortcomings
> > > of
> > > > > > > current
> > > > > > > > > > > (Flink
> > > > > > > > > > > > > 1.9)
> > > > > > > > > > > > > > > > > > > > TaskExecutor memory configuration.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - Different configuration for
> > > Streaming
> > > > > and
> > > > > > > > Batch.
> > > > > > > > > > > > > > > > > > > >    - Complex and difficult
> > configuration
> > > of
> > > > > > > RocksDB
> > > > > > > > > in
> > > > > > > > > > > > > > Streaming.
> > > > > > > > > > > > > > > > > > > >    - Complicated, uncertain and hard
> to
> > > > > > > understand.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Key changes to solve the problems can
> > be
> > > > > > > summarized
> > > > > > > > > as
> > > > > > > > > > > > > follows.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - Extend memory manager to also
> > > account
> > > > > for
> > > > > > > > memory
> > > > > > > > > > > usage
> > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > > > > > > >    - Modify how TaskExecutor memory
> is
> > > > > > > partitioned
> > > > > > > > > > > > accounted
> > > > > > > > > > > > > > > > > individual
> > > > > > > > > > > > > > > > > > > >    memory reservations and pools.
> > > > > > > > > > > > > > > > > > > >    - Simplify memory configuration
> > > options
> > > > > and
> > > > > > > > > > > calculations
> > > > > > > > > > > > > > > logics.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Please find more details in the FLIP
> > wiki
> > > > > > > document
> > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > (Please note that the early design
> doc
> > > [2]
> > > > is
> > > > > > out
> > > > > > > > of
> > > > > > > > > > > sync,
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > appreciated to have the discussion in
> > > this
> > > > > > > mailing
> > > > > > > > > list
> > > > > > > > > > > > > > thread.)
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Looking forward to your feedbacks.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Reply via email to