Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Stephan Ewen Thu, 29 Aug 2019 03:51:14 -0700

I see. Under the assumption of strict determinism that should work.

The original proposal had this point "don't compute inside the TM, compute
outside and supply a full config", because that sounded more intuitive.


On Thu, Aug 29, 2019 at 12:15 PM Till Rohrmann <[email protected]> wrote:

> My understanding was that before starting the Flink process we call a
> utility which calculates these values. I assume that this utility will do
> the calculation based on a set of configured values (process memory, flink
> memory, network memory etc.). Assuming that these values don't differ from
> the values with which the JVM is started, it should be possible to
> recompute them in the Flink process in order to set the values.
>
>
>
> On Thu, Aug 29, 2019 at 11:29 AM Stephan Ewen <[email protected]> wrote:
>
> > When computing the values in the JVM process after it started, how would
> > you deal with values like Max Direct Memory, Metaspace size. native
> memory
> > reservation (reduce heap size), etc? All the values that are parameters
> to
> > the JVM process and that need to be supplied at process startup?
> >
> > On Wed, Aug 28, 2019 at 4:46 PM Till Rohrmann <[email protected]>
> > wrote:
> >
> > > Thanks for the clarification. I have some more comments:
> > >
> > > - I would actually split the logic to compute the process memory
> > > requirements and storing the values into two things. E.g. one could
> name
> > > the former TaskExecutorProcessUtility and  the latter
> > > TaskExecutorProcessMemory. But we can discuss this on the PR since it's
> > > just a naming detail.
> > >
> > > - Generally, I'm not opposed to making configuration values overridable
> > by
> > > ENV variables. I think this is a very good idea and makes the
> > > configurability of Flink processes easier. However, I think that adding
> > > this functionality should not be part of this FLIP because it would
> > simply
> > > widen the scope unnecessarily.
> > >
> > > The reasons why I believe it is unnecessary are the following: For Yarn
> > we
> > > already create write a flink-conf.yaml which could be populated with
> the
> > > memory settings. For the other processes it should not make a
> difference
> > > whether the loaded Configuration is populated with the memory settings
> > from
> > > ENV variables or by using TaskExecutorProcessUtility to compute the
> > missing
> > > values from the loaded configuration. If the latter would not be
> possible
> > > (wrong or missing configuration values), then we should not have been
> > able
> > > to actually start the process in the first place.
> > >
> > > - Concerning the memory reservation: I agree with you that we need the
> > > memory reservation functionality to make streaming jobs work with
> > "managed"
> > > memory. However, w/o this functionality the whole Flip would already
> > bring
> > > a good amount of improvements to our users when running batch jobs.
> > > Moreover, by keeping the scope smaller we can complete the FLIP faster.
> > > Hence, I would propose to address the memory reservation functionality
> > as a
> > > follow up FLIP (which Yu is working on if I'm not mistaken).
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Aug 28, 2019 at 11:43 AM Yang Wang <[email protected]>
> > wrote:
> > >
> > > > Just add my 2 cents.
> > > >
> > > > Using environment variables to override the configuration for
> different
> > > > taskmanagers is better.
> > > > We do not need to generate dedicated flink-conf.yaml for all
> > > taskmanagers.
> > > > A common flink-conf.yam and different environment variables are
> enough.
> > > > By reducing the distributed cached files, it could make launching a
> > > > taskmanager faster.
> > > >
> > > > Stephan gives a good suggestion that we could move the logic into
> > > > "GlobalConfiguration.loadConfig()" method.
> > > > Maybe the client could also benefit from this. Different users do not
> > > have
> > > > to export FLINK_CONF_DIR to update few config options.
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Stephan Ewen <[email protected]> 于2019年8月28日周三 上午1:21写道：
> > > >
> > > > > One note on the Environment Variables and Configuration discussion.
> > > > >
> > > > > My understanding is that passed ENV variables are added to the
> > > > > configuration in the "GlobalConfiguration.loadConfig()" method (or
> > > > > similar).
> > > > > For all the code inside Flink, it looks like the data was in the
> > config
> > > > to
> > > > > start with, just that the scripts that compute the variables can
> pass
> > > the
> > > > > values to the process without actually needing to write a file.
> > > > >
> > > > > For example the "GlobalConfiguration.loadConfig()" method would
> take
> > > any
> > > > > ENV variable prefixed with "flink" and add it as a config key.
> > > > > "flink_taskmanager_memory_size=2g" would become
> > > "taskmanager.memory.size:
> > > > > 2g".
> > > > >
> > > > >
> > > > > On Tue, Aug 27, 2019 at 4:05 PM Xintong Song <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the comments, Till.
> > > > > >
> > > > > > I've also seen your comments on the wiki page, but let's keep the
> > > > > > discussion here.
> > > > > >
> > > > > > - Regarding 'TaskExecutorSpecifics', how do you think about
> naming
> > it
> > > > > > 'TaskExecutorResourceSpecifics'.
> > > > > > - Regarding passing memory configurations into task executors,
> I'm
> > in
> > > > > favor
> > > > > > of do it via environment variables rather than configurations,
> with
> > > the
> > > > > > following two reasons.
> > > > > >   - It is easier to keep the memory options once calculate not to
> > be
> > > > > > changed with environment variables rather than configurations.
> > > > > >   - I'm not sure whether we should write the configuration in
> > startup
> > > > > > scripts. Writing changes into the configuration files when
> running
> > > the
> > > > > > startup scripts does not sounds right to me. Or we could make a
> > copy
> > > of
> > > > > > configuration files per flink cluster, and make the task executor
> > to
> > > > load
> > > > > > from the copy, and clean up the copy after the cluster is
> shutdown,
> > > > which
> > > > > > is complicated. (I think this is also what Stephan means in his
> > > comment
> > > > > on
> > > > > > the wiki page?)
> > > > > > - Regarding reserving memory, I think this change should be
> > included
> > > in
> > > > > > this FLIP. I think a big part of motivations of this FLIP is to
> > unify
> > > > > > memory configuration for streaming / batch and make it easy for
> > > > > configuring
> > > > > > rocksdb memory. If we don't support memory reservation, then
> > > streaming
> > > > > jobs
> > > > > > cannot use managed memory (neither on-heap or off-heap), which
> > makes
> > > > this
> > > > > > FLIP incomplete.
> > > > > > - Regarding network memory, I think you are right. I think we
> > > probably
> > > > > > don't need to change network stack from using direct memory to
> > using
> > > > > unsafe
> > > > > > native memory. Network memory size is deterministic, cannot be
> > > reserved
> > > > > as
> > > > > > managed memory does, and cannot be overused. I think it also
> works
> > if
> > > > we
> > > > > > simply keep using direct memory for network and include it in jvm
> > max
> > > > > > direct memory size.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 27, 2019 at 8:12 PM Till Rohrmann <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Xintong,
> > > > > > >
> > > > > > > thanks for addressing the comments and adding a more detailed
> > > > > > > implementation plan. I have a couple of comments concerning the
> > > > > > > implementation plan:
> > > > > > >
> > > > > > > - The name `TaskExecutorSpecifics` is not really descriptive.
> > > > Choosing
> > > > > a
> > > > > > > different name could help here.
> > > > > > > - I'm not sure whether I would pass the memory configuration to
> > the
> > > > > > > TaskExecutor via environment variables. I think it would be
> > better
> > > to
> > > > > > write
> > > > > > > it into the configuration one uses to start the TM process.
> > > > > > > - If possible, I would exclude the memory reservation from this
> > > FLIP
> > > > > and
> > > > > > > add this as part of a dedicated FLIP.
> > > > > > > - If possible, then I would exclude changes to the network
> stack
> > > from
> > > > > > this
> > > > > > > FLIP. Maybe we can simply say that the direct memory needed by
> > the
> > > > > > network
> > > > > > > stack is the framework direct memory requirement. Changing how
> > the
> > > > > memory
> > > > > > > is allocated can happen in a second step. This would keep the
> > scope
> > > > of
> > > > > > this
> > > > > > > FLIP smaller.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Thu, Aug 22, 2019 at 2:51 PM Xintong Song <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > I just updated the FLIP document on wiki [1], with the
> > following
> > > > > > changes.
> > > > > > > >
> > > > > > > >    - Removed open question regarding MemorySegment
> allocation.
> > As
> > > > > > > >    discussed, we exclude this topic from the scope of this
> > FLIP.
> > > > > > > >    - Updated content about JVM direct memory parameter
> > according
> > > to
> > > > > > > recent
> > > > > > > >    discussions, and moved the other options to "Rejected
> > > > > Alternatives"
> > > > > > > for
> > > > > > > > the
> > > > > > > >    moment.
> > > > > > > >    - Added implementation steps.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > >
> > > > > > > > On Mon, Aug 19, 2019 at 7:16 PM Stephan Ewen <
> [email protected]
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > @Xintong: Concerning "wait for memory users before task
> > dispose
> > > > and
> > > > > > > > memory
> > > > > > > > > release": I agree, that's how it should be. Let's try it
> out.
> > > > > > > > >
> > > > > > > > > @Xintong @Jingsong: Concerning " JVM does not wait for GC
> > when
> > > > > > > allocating
> > > > > > > > > direct memory buffer": There seems to be pretty elaborate
> > logic
> > > > to
> > > > > > free
> > > > > > > > > buffers when allocating new ones. See
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/jdk/file/tip/src/share/classes/java/nio/Bits.java#l643
> > > > > > > > >
> > > > > > > > > @Till: Maybe. If we assume that the JVM default works (like
> > > going
> > > > > > with
> > > > > > > > > option 2 and not setting "-XX:MaxDirectMemorySize" at all),
> > > then
> > > > I
> > > > > > > think
> > > > > > > > it
> > > > > > > > > should be okay to set "-XX:MaxDirectMemorySize" to
> > > > > > > > > "off_heap_managed_memory + direct_memory" even if we use
> > > RocksDB.
> > > > > > That
> > > > > > > > is a
> > > > > > > > > big if, though, I honestly have no idea :D Would be good to
> > > > > > understand
> > > > > > > > > this, though, because this would affect option (2) and
> option
> > > > > (1.2).
> > > > > > > > >
> > > > > > > > > On Mon, Aug 19, 2019 at 4:44 PM Xintong Song <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the inputs, Jingsong.
> > > > > > > > > >
> > > > > > > > > > Let me try to summarize your points. Please correct me if
> > I'm
> > > > > > wrong.
> > > > > > > > > >
> > > > > > > > > >    - Memory consumers should always avoid returning
> memory
> > > > > segments
> > > > > > > to
> > > > > > > > > >    memory manager while there are still un-cleaned
> > > structures /
> > > > > > > threads
> > > > > > > > > > that
> > > > > > > > > >    may use the memory. Otherwise, it would cause serious
> > > > problems
> > > > > > by
> > > > > > > > > having
> > > > > > > > > >    multiple consumers trying to use the same memory
> > segment.
> > > > > > > > > >    - JVM does not wait for GC when allocating direct
> memory
> > > > > buffer.
> > > > > > > > > >    Therefore even we set proper max direct memory size
> > limit,
> > > > we
> > > > > > may
> > > > > > > > > still
> > > > > > > > > >    encounter direct memory oom if the GC cleaning memory
> > > slower
> > > > > > than
> > > > > > > > the
> > > > > > > > > >    direct memory allocation.
> > > > > > > > > >
> > > > > > > > > > Am I understanding this correctly?
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 19, 2019 at 4:21 PM JingsongLee <
> > > > > > [email protected]
> > > > > > > > > > .invalid>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi stephan:
> > > > > > > > > > >
> > > > > > > > > > > About option 2:
> > > > > > > > > > >
> > > > > > > > > > > if additional threads not cleanly shut down before we
> can
> > > > exit
> > > > > > the
> > > > > > > > > task:
> > > > > > > > > > > In the current case of memory reuse, it has freed up
> the
> > > > memory
> > > > > > it
> > > > > > > > > > >  uses. If this memory is used by other tasks and
> > > asynchronous
> > > > > > > threads
> > > > > > > > > > >  of exited task may still be writing, there will be
> > > > concurrent
> > > > > > > > security
> > > > > > > > > > >  problems, and even lead to errors in user computing
> > > results.
> > > > > > > > > > >
> > > > > > > > > > > So I think this is a serious and intolerable bug, No
> > matter
> > > > > what
> > > > > > > the
> > > > > > > > > > >  option is, it should be avoided.
> > > > > > > > > > >
> > > > > > > > > > > About direct memory cleaned by GC:
> > > > > > > > > > > I don't think it is a good idea, I've encountered so
> many
> > > > > > > situations
> > > > > > > > > > >  that it's too late for GC to cause DirectMemory OOM.
> > > Release
> > > > > and
> > > > > > > > > > >  allocate DirectMemory depend on the type of user job,
> > > which
> > > > is
> > > > > > > > > > >  often beyond our control.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Jingsong Lee
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > >
> ------------------------------------------------------------------
> > > > > > > > > > > From:Stephan Ewen <[email protected]>
> > > > > > > > > > > Send Time:2019年8月19日(星期一) 15:56
> > > > > > > > > > > To:dev <[email protected]>
> > > > > > > > > > > Subject:Re: [DISCUSS] FLIP-49: Unified Memory
> > Configuration
> > > > for
> > > > > > > > > > > TaskExecutors
> > > > > > > > > > >
> > > > > > > > > > > My main concern with option 2 (manually release memory)
> > is
> > > > that
> > > > > > > > > segfaults
> > > > > > > > > > > in the JVM send off all sorts of alarms on user ends.
> So
> > we
> > > > > need
> > > > > > to
> > > > > > > > > > > guarantee that this never happens.
> > > > > > > > > > >
> > > > > > > > > > > The trickyness is in tasks that uses data structures /
> > > > > algorithms
> > > > > > > > with
> > > > > > > > > > > additional threads, like hash table spill/read and
> > sorting
> > > > > > threads.
> > > > > > > > We
> > > > > > > > > > need
> > > > > > > > > > > to ensure that these cleanly shut down before we can
> exit
> > > the
> > > > > > task.
> > > > > > > > > > > I am not sure that we have that guaranteed already,
> > that's
> > > > why
> > > > > > > option
> > > > > > > > > 1.1
> > > > > > > > > > > seemed simpler to me.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 19, 2019 at 3:42 PM Xintong Song <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the comments, Stephan. Summarized in this
> > way
> > > > > really
> > > > > > > > makes
> > > > > > > > > > > > things easier to understand.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm in favor of option 2, at least for the moment. I
> > > think
> > > > it
> > > > > > is
> > > > > > > > not
> > > > > > > > > > that
> > > > > > > > > > > > difficult to keep it segfault safe for memory
> manager,
> > as
> > > > > long
> > > > > > as
> > > > > > > > we
> > > > > > > > > > > always
> > > > > > > > > > > > de-allocate the memory segment when it is released
> from
> > > the
> > > > > > > memory
> > > > > > > > > > > > consumers. Only if the memory consumer continue using
> > the
> > > > > > buffer
> > > > > > > of
> > > > > > > > > > > memory
> > > > > > > > > > > > segment after releasing it, in which case we do want
> > the
> > > > job
> > > > > to
> > > > > > > > fail
> > > > > > > > > so
> > > > > > > > > > > we
> > > > > > > > > > > > detect the memory leak early.
> > > > > > > > > > > >
> > > > > > > > > > > > For option 1.2, I don't think this is a good idea.
> Not
> > > only
> > > > > > > because
> > > > > > > > > the
> > > > > > > > > > > > assumption (regular GC is enough to clean direct
> > buffers)
> > > > may
> > > > > > not
> > > > > > > > > > always
> > > > > > > > > > > be
> > > > > > > > > > > > true, but also it makes harder for finding problems
> in
> > > > cases
> > > > > of
> > > > > > > > > memory
> > > > > > > > > > > > overuse. E.g., user configured some direct memory for
> > the
> > > > > user
> > > > > > > > > > libraries.
> > > > > > > > > > > > If the library actually use more direct memory then
> > > > > configured,
> > > > > > > > which
> > > > > > > > > > > > cannot be cleaned by GC because they are still in
> use,
> > > may
> > > > > lead
> > > > > > > to
> > > > > > > > > > > overuse
> > > > > > > > > > > > of the total container memory. In that case, if it
> > didn't
> > > > > touch
> > > > > > > the
> > > > > > > > > JVM
> > > > > > > > > > > > default max direct memory limit, we cannot get a
> direct
> > > > > memory
> > > > > > > OOM
> > > > > > > > > and
> > > > > > > > > > it
> > > > > > > > > > > > will become super hard to understand which part of
> the
> > > > > > > > configuration
> > > > > > > > > > need
> > > > > > > > > > > > to be updated.
> > > > > > > > > > > >
> > > > > > > > > > > > For option 1.1, it has the similar problem as 1.2, if
> > the
> > > > > > > exceeded
> > > > > > > > > > direct
> > > > > > > > > > > > memory does not reach the max direct memory limit
> > > specified
> > > > > by
> > > > > > > the
> > > > > > > > > > > > dedicated parameter. I think it is slightly better
> than
> > > > 1.2,
> > > > > > only
> > > > > > > > > > because
> > > > > > > > > > > > we can tune the parameter.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 19, 2019 at 2:53 PM Stephan Ewen <
> > > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > About the "-XX:MaxDirectMemorySize" discussion,
> maybe
> > > let
> > > > > me
> > > > > > > > > > summarize
> > > > > > > > > > > > it a
> > > > > > > > > > > > > bit differently:
> > > > > > > > > > > > >
> > > > > > > > > > > > > We have the following two options:
> > > > > > > > > > > > >
> > > > > > > > > > > > > (1) We let MemorySegments be de-allocated by the
> GC.
> > > That
> > > > > > makes
> > > > > > > > it
> > > > > > > > > > > > segfault
> > > > > > > > > > > > > safe. But then we need a way to trigger GC in case
> > > > > > > de-allocation
> > > > > > > > > and
> > > > > > > > > > > > > re-allocation of a bunch of segments happens
> quickly,
> > > > which
> > > > > > is
> > > > > > > > > often
> > > > > > > > > > > the
> > > > > > > > > > > > > case during batch scheduling or task restart.
> > > > > > > > > > > > >   - The "-XX:MaxDirectMemorySize" (option 1.1) is
> one
> > > way
> > > > > to
> > > > > > do
> > > > > > > > > this
> > > > > > > > > > > > >   - Another way could be to have a dedicated
> > > bookkeeping
> > > > in
> > > > > > the
> > > > > > > > > > > > > MemoryManager (option 1.2), so that this is a
> number
> > > > > > > independent
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > "-XX:MaxDirectMemorySize" parameter.
> > > > > > > > > > > > >
> > > > > > > > > > > > > (2) We manually allocate and de-allocate the memory
> > for
> > > > the
> > > > > > > > > > > > MemorySegments
> > > > > > > > > > > > > (option 2). That way we need not worry about
> > triggering
> > > > GC
> > > > > by
> > > > > > > > some
> > > > > > > > > > > > > threshold or bookkeeping, but it is harder to
> prevent
> > > > > > > segfaults.
> > > > > > > > We
> > > > > > > > > > > need
> > > > > > > > > > > > to
> > > > > > > > > > > > > be very careful about when we release the memory
> > > segments
> > > > > > (only
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > cleanup phase of the main thread).
> > > > > > > > > > > > >
> > > > > > > > > > > > > If we go with option 1.1, we probably need to set
> > > > > > > > > > > > > "-XX:MaxDirectMemorySize" to
> > "off_heap_managed_memory +
> > > > > > > > > > direct_memory"
> > > > > > > > > > > > and
> > > > > > > > > > > > > have "direct_memory" as a separate reserved memory
> > > pool.
> > > > > > > Because
> > > > > > > > if
> > > > > > > > > > we
> > > > > > > > > > > > just
> > > > > > > > > > > > > set "-XX:MaxDirectMemorySize" to
> > > > "off_heap_managed_memory +
> > > > > > > > > > > > jvm_overhead",
> > > > > > > > > > > > > then there will be times when that entire memory is
> > > > > allocated
> > > > > > > by
> > > > > > > > > > direct
> > > > > > > > > > > > > buffers and we have nothing left for the JVM
> > overhead.
> > > So
> > > > > we
> > > > > > > > either
> > > > > > > > > > > need
> > > > > > > > > > > > a
> > > > > > > > > > > > > way to compensate for that (again some safety
> margin
> > > > cutoff
> > > > > > > > value)
> > > > > > > > > or
> > > > > > > > > > > we
> > > > > > > > > > > > > will exceed container memory.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If we go with option 1.2, we need to be aware that
> it
> > > > takes
> > > > > > > > > elaborate
> > > > > > > > > > > > logic
> > > > > > > > > > > > > to push recycling of direct buffers without always
> > > > > > triggering a
> > > > > > > > > full
> > > > > > > > > > > GC.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > My first guess is that the options will be easiest
> to
> > > do
> > > > in
> > > > > > the
> > > > > > > > > > > following
> > > > > > > > > > > > > order:
> > > > > > > > > > > > >
> > > > > > > > > > > > >   - Option 1.1 with a dedicated direct_memory
> > > parameter,
> > > > as
> > > > > > > > > discussed
> > > > > > > > > > > > > above. We would need to find a way to set the
> > > > direct_memory
> > > > > > > > > parameter
> > > > > > > > > > > by
> > > > > > > > > > > > > default. We could start with 64 MB and see how it
> > goes
> > > in
> > > > > > > > practice.
> > > > > > > > > > One
> > > > > > > > > > > > > danger I see is that setting this loo low can
> cause a
> > > > bunch
> > > > > > of
> > > > > > > > > > > additional
> > > > > > > > > > > > > GCs compared to before (we need to watch this
> > > carefully).
> > > > > > > > > > > > >
> > > > > > > > > > > > >   - Option 2. It is actually quite simple to
> > implement,
> > > > we
> > > > > > > could
> > > > > > > > > try
> > > > > > > > > > > how
> > > > > > > > > > > > > segfault safe we are at the moment.
> > > > > > > > > > > > >
> > > > > > > > > > > > >   - Option 1.2: We would not touch the
> > > > > > > "-XX:MaxDirectMemorySize"
> > > > > > > > > > > > parameter
> > > > > > > > > > > > > at all and assume that all the direct memory
> > > allocations
> > > > > that
> > > > > > > the
> > > > > > > > > JVM
> > > > > > > > > > > and
> > > > > > > > > > > > > Netty do are infrequent enough to be cleaned up
> fast
> > > > enough
> > > > > > > > through
> > > > > > > > > > > > regular
> > > > > > > > > > > > > GC. I am not sure if that is a valid assumption,
> > > though.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Stephan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> > > > > > > > > [email protected]>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for sharing your opinion Till.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm also in favor of alternative 2. I was
> wondering
> > > > > whether
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > > avoid
> > > > > > > > > > > > > > using Unsafe.allocate() for off-heap managed
> memory
> > > and
> > > > > > > network
> > > > > > > > > > > memory
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > alternative 3. But after giving it a second
> > thought,
> > > I
> > > > > > think
> > > > > > > > even
> > > > > > > > > > for
> > > > > > > > > > > > > > alternative 3 using direct memory for off-heap
> > > managed
> > > > > > memory
> > > > > > > > > could
> > > > > > > > > > > > cause
> > > > > > > > > > > > > > problems.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Yang,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regarding your concern, I think what proposed in
> > this
> > > > > FLIP
> > > > > > it
> > > > > > > > to
> > > > > > > > > > have
> > > > > > > > > > > > > both
> > > > > > > > > > > > > > off-heap managed memory and network memory
> > allocated
> > > > > > through
> > > > > > > > > > > > > > Unsafe.allocate(), which means they are
> practically
> > > > > native
> > > > > > > > memory
> > > > > > > > > > and
> > > > > > > > > > > > not
> > > > > > > > > > > > > > limited by JVM max direct memory. The only parts
> of
> > > > > memory
> > > > > > > > > limited
> > > > > > > > > > by
> > > > > > > > > > > > JVM
> > > > > > > > > > > > > > max direct memory are task off-heap memory and
> JVM
> > > > > > overhead,
> > > > > > > > > which
> > > > > > > > > > > are
> > > > > > > > > > > > > > exactly alternative 2 suggests to set the JVM max
> > > > direct
> > > > > > > memory
> > > > > > > > > to.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the clarification Xintong. I
> > understand
> > > > the
> > > > > > two
> > > > > > > > > > > > alternatives
> > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would be in favour of option 2 because it
> makes
> > > > > things
> > > > > > > > > > explicit.
> > > > > > > > > > > If
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > don't limit the direct memory, I fear that we
> > might
> > > > end
> > > > > > up
> > > > > > > > in a
> > > > > > > > > > > > similar
> > > > > > > > > > > > > > > situation as we are currently in: The user
> might
> > > see
> > > > > that
> > > > > > > her
> > > > > > > > > > > process
> > > > > > > > > > > > > > gets
> > > > > > > > > > > > > > > killed by the OS and does not know why this is
> > the
> > > > > case.
> > > > > > > > > > > > Consequently,
> > > > > > > > > > > > > > she
> > > > > > > > > > > > > > > tries to decrease the process memory size
> > (similar
> > > to
> > > > > > > > > increasing
> > > > > > > > > > > the
> > > > > > > > > > > > > > cutoff
> > > > > > > > > > > > > > > ratio) in order to accommodate for the extra
> > direct
> > > > > > memory.
> > > > > > > > > Even
> > > > > > > > > > > > worse,
> > > > > > > > > > > > > > she
> > > > > > > > > > > > > > > tries to decrease memory budgets which are not
> > > fully
> > > > > used
> > > > > > > and
> > > > > > > > > > hence
> > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > change the overall memory consumption.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Let me explain this with a concrete example
> > Till.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Let's say we have the following scenario.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Total Process Memory: 1GB
> > > > > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM
> > > > > > Overhead):
> > > > > > > > > 200MB
> > > > > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace,
> > > > > Off-Heap
> > > > > > > > > Managed
> > > > > > > > > > > > Memory
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > Network Memory): 800MB
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > For alternative 2, we set
> > -XX:MaxDirectMemorySize
> > > > to
> > > > > > > 200MB.
> > > > > > > > > > > > > > > > For alternative 3, we set
> > -XX:MaxDirectMemorySize
> > > > to
> > > > > a
> > > > > > > very
> > > > > > > > > > large
> > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > let's say 1TB.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > > Off-Heap
> > > > > > Memory
> > > > > > > > and
> > > > > > > > > > JVM
> > > > > > > > > > > > > > > Overhead
> > > > > > > > > > > > > > > > do not exceed 200MB, then alternative 2 and
> > > > > > alternative 3
> > > > > > > > > > should
> > > > > > > > > > > > have
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > same utility. Setting larger
> > > > -XX:MaxDirectMemorySize
> > > > > > will
> > > > > > > > not
> > > > > > > > > > > > reduce
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > sizes of the other memory pools.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > > Off-Heap
> > > > > > Memory
> > > > > > > > and
> > > > > > > > > > JVM
> > > > > > > > > > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >    - Alternative 2 suffers from frequent OOM.
> > To
> > > > > avoid
> > > > > > > > that,
> > > > > > > > > > the
> > > > > > > > > > > > only
> > > > > > > > > > > > > > > thing
> > > > > > > > > > > > > > > >    user can do is to modify the configuration
> > and
> > > > > > > increase
> > > > > > > > > JVM
> > > > > > > > > > > > Direct
> > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > >    (Task Off-Heap Memory + JVM Overhead).
> Let's
> > > say
> > > > > > that
> > > > > > > > user
> > > > > > > > > > > > > increases
> > > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > > >    Direct Memory to 250MB, this will reduce
> the
> > > > total
> > > > > > > size
> > > > > > > > of
> > > > > > > > > > > other
> > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > >    pools to 750MB, given the total process
> > memory
> > > > > > remains
> > > > > > > > > 1GB.
> > > > > > > > > > > > > > > >    - For alternative 3, there is no chance of
> > > > direct
> > > > > > OOM.
> > > > > > > > > There
> > > > > > > > > > > are
> > > > > > > > > > > > > > > chances
> > > > > > > > > > > > > > > >    of exceeding the total process memory
> limit,
> > > but
> > > > > > given
> > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > >    not use up all the reserved native memory
> > > > > (Off-Heap
> > > > > > > > > Managed
> > > > > > > > > > > > > Memory,
> > > > > > > > > > > > > > > > Network
> > > > > > > > > > > > > > > >    Memory, JVM Metaspace), if the actual
> direct
> > > > > memory
> > > > > > > > usage
> > > > > > > > > is
> > > > > > > > > > > > > > slightly
> > > > > > > > > > > > > > > > above
> > > > > > > > > > > > > > > >    yet very close to 200MB, user probably do
> > not
> > > > need
> > > > > > to
> > > > > > > > > change
> > > > > > > > > > > the
> > > > > > > > > > > > > > > >    configurations.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Therefore, I think from the user's
> > perspective, a
> > > > > > > feasible
> > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > for alternative 2 may lead to lower resource
> > > > > > utilization
> > > > > > > > > > compared
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > alternative 3.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till
> Rohrmann
> > <
> > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I guess you have to help me understand the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > > > > alternative 2
> > > > > > > > > > > > > > > > > and 3 wrt to memory under utilization
> > Xintong.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize
> > to
> > > > Task
> > > > > > > > > Off-Heap
> > > > > > > > > > > > Memory
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > > > > Overhead. Then there is the risk that this
> > size
> > > > is
> > > > > > too
> > > > > > > > low
> > > > > > > > > > > > > resulting
> > > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > > lot of garbage collection and potentially
> an
> > > OOM.
> > > > > > > > > > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize
> > to
> > > > > > > something
> > > > > > > > > > larger
> > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > alternative 2. This would of course reduce
> > the
> > > > > sizes
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > other
> > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > types.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > How would alternative 2 now result in an
> > under
> > > > > > > > utilization
> > > > > > > > > of
> > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > compared to alternative 3? If alternative 3
> > > > > strictly
> > > > > > > > sets a
> > > > > > > > > > > > higher
> > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > direct memory size and we use only little,
> > > then I
> > > > > > would
> > > > > > > > > > expect
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > alternative 3 results in memory under
> > > > utilization.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi xintong,till
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Native and Direct Memory
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > My point is setting a very large max
> direct
> > > > > memory
> > > > > > > size
> > > > > > > > > > when
> > > > > > > > > > > we
> > > > > > > > > > > > > do
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > differentiate direct and native memory.
> If
> > > the
> > > > > > direct
> > > > > > > > > > > > > > > memory,including
> > > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > direct memory and framework direct
> > > memory,could
> > > > > be
> > > > > > > > > > calculated
> > > > > > > > > > > > > > > > > > correctly,then
> > > > > > > > > > > > > > > > > > i am in favor of setting direct memory
> with
> > > > fixed
> > > > > > > > value.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Memory Calculation
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I agree with xintong. For Yarn and k8s,we
> > > need
> > > > to
> > > > > > > check
> > > > > > > > > the
> > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > configurations in client to avoid
> > submitting
> > > > > > > > successfully
> > > > > > > > > > and
> > > > > > > > > > > > > > failing
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > the flink master.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Yang
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song <[email protected]
> > > > >于2019年8月13日
> > > > > > > > > 周二22:07写道：
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > About MemorySegment, I think you are
> > right
> > > > that
> > > > > > we
> > > > > > > > > should
> > > > > > > > > > > not
> > > > > > > > > > > > > > > include
> > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > issue in the scope of this FLIP. This
> > FLIP
> > > > > should
> > > > > > > > > > > concentrate
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > configure memory pools for
> TaskExecutors,
> > > > with
> > > > > > > > minimum
> > > > > > > > > > > > > > involvement
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > memory consumers use it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > About direct memory, I think
> alternative
> > 3
> > > > may
> > > > > > not
> > > > > > > > > having
> > > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > > reservation issue that alternative 2
> > does,
> > > > but
> > > > > at
> > > > > > > the
> > > > > > > > > > cost
> > > > > > > > > > > of
> > > > > > > > > > > > > > risk
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > > using memory at the container level,
> > which
> > > is
> > > > > not
> > > > > > > > good.
> > > > > > > > > > My
> > > > > > > > > > > > > point
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and "JVM
> > > > Overhead"
> > > > > > are
> > > > > > > > not
> > > > > > > > > > easy
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > config.
> > > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > > alternative 2, users might configure
> them
> > > > > higher
> > > > > > > than
> > > > > > > > > > what
> > > > > > > > > > > > > > actually
> > > > > > > > > > > > > > > > > > needed,
> > > > > > > > > > > > > > > > > > > just to avoid getting a direct OOM. For
> > > > > > alternative
> > > > > > > > 3,
> > > > > > > > > > > users
> > > > > > > > > > > > do
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > > > > direct OOM, so they may not config the
> > two
> > > > > > options
> > > > > > > > > > > > aggressively
> > > > > > > > > > > > > > > high.
> > > > > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > > > > the consequences are risks of overall
> > > > container
> > > > > > > > memory
> > > > > > > > > > > usage
> > > > > > > > > > > > > > > exceeds
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > budget.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till
> > > > Rohrmann <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for proposing this FLIP
> Xintong.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > All in all I think it already looks
> > quite
> > > > > good.
> > > > > > > > > > > Concerning
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > open
> > > > > > > > > > > > > > > > > > > > question about allocating memory
> > > segments,
> > > > I
> > > > > > was
> > > > > > > > > > > wondering
> > > > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > strictly necessary to do in the
> context
> > > of
> > > > > this
> > > > > > > > FLIP
> > > > > > > > > or
> > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > be done as a follow up? Without
> knowing
> > > all
> > > > > > > > details,
> > > > > > > > > I
> > > > > > > > > > > > would
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > concerned
> > > > > > > > > > > > > > > > > > > > that we would widen the scope of this
> > > FLIP
> > > > > too
> > > > > > > much
> > > > > > > > > > > because
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > to touch all the existing call sites
> of
> > > the
> > > > > > > > > > MemoryManager
> > > > > > > > > > > > > where
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > allocate
> > > > > > > > > > > > > > > > > > > > memory segments (this should mainly
> be
> > > > batch
> > > > > > > > > > operators).
> > > > > > > > > > > > The
> > > > > > > > > > > > > > > > addition
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > the memory reservation call to the
> > > > > > MemoryManager
> > > > > > > > > should
> > > > > > > > > > > not
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > affected
> > > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > this and I would hope that this is
> the
> > > only
> > > > > > point
> > > > > > > > of
> > > > > > > > > > > > > > interaction
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > streaming job would have with the
> > > > > > MemoryManager.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Concerning the second open question
> > about
> > > > > > setting
> > > > > > > > or
> > > > > > > > > > not
> > > > > > > > > > > > > > setting
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > > direct memory limit, I would also be
> > > > > interested
> > > > > > > why
> > > > > > > > > > Yang
> > > > > > > > > > > > Wang
> > > > > > > > > > > > > > > > thinks
> > > > > > > > > > > > > > > > > > > > leaving it open would be best. My
> > concern
> > > > > about
> > > > > > > > this
> > > > > > > > > > > would
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > be in a similar situation as we are
> now
> > > > with
> > > > > > the
> > > > > > > > > > > > > > > > RocksDBStateBackend.
> > > > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > > > the different memory pools are not
> > > clearly
> > > > > > > > separated
> > > > > > > > > > and
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > spill
> > > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > a different pool, then it is quite
> hard
> > > to
> > > > > > > > understand
> > > > > > > > > > > what
> > > > > > > > > > > > > > > exactly
> > > > > > > > > > > > > > > > > > > causes a
> > > > > > > > > > > > > > > > > > > > process to get killed for using too
> > much
> > > > > > memory.
> > > > > > > > This
> > > > > > > > > > > could
> > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > easily
> > > > > > > > > > > > > > > > > > > > lead to a similar situation what we
> > have
> > > > with
> > > > > > the
> > > > > > > > > > > > > cutoff-ratio.
> > > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > setting a sane default value for max
> > > direct
> > > > > > > memory
> > > > > > > > > and
> > > > > > > > > > > > giving
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > option to increase it if he runs into
> > an
> > > > OOM.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Xintong, how would alternative 2
> lead
> > to
> > > > > lower
> > > > > > > > > memory
> > > > > > > > > > > > > > > utilization
> > > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > > alternative 3 where we set the direct
> > > > memory
> > > > > > to a
> > > > > > > > > > higher
> > > > > > > > > > > > > value?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM
> Xintong
> > > > Song <
> > > > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > > > > > > > > > I think setting a very large max
> > direct
> > > > > > memory
> > > > > > > > size
> > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > > > good sides. E.g., we do not worry
> > about
> > > > > > direct
> > > > > > > > OOM,
> > > > > > > > > > and
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > to allocate managed / network
> memory
> > > with
> > > > > > > > > > > > > Unsafe.allocate() .
> > > > > > > > > > > > > > > > > > > > > However, there are also some down
> > sides
> > > > of
> > > > > > > doing
> > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >    - One thing I can think of is
> that
> > > if
> > > > a
> > > > > > task
> > > > > > > > > > > executor
> > > > > > > > > > > > > > > > container
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > >    killed due to overusing memory,
> it
> > > > could
> > > > > > be
> > > > > > > > hard
> > > > > > > > > > for
> > > > > > > > > > > > use
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > part
> > > > > > > > > > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > > > > > > > > > >    - Another down side is that the
> > JVM
> > > > > never
> > > > > > > > > trigger
> > > > > > > > > > GC
> > > > > > > > > > > > due
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > reaching
> > > > > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > > >    direct memory limit, because the
> > > limit
> > > > > is
> > > > > > > too
> > > > > > > > > high
> > > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > reached.
> > > > > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > > > > >    means we kind of relay on heap
> > > memory
> > > > to
> > > > > > > > trigger
> > > > > > > > > > GC
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > > >    memory. That could be a problem
> in
> > > > cases
> > > > > > > where
> > > > > > > > > we
> > > > > > > > > > > have
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > > >    usage but not enough heap
> activity
> > > to
> > > > > > > trigger
> > > > > > > > > the
> > > > > > > > > > > GC.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Maybe you can share your reasons
> for
> > > > > > preferring
> > > > > > > > > > > setting a
> > > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > > > > > > if there are anything else I
> > > overlooked.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > > > > > > > > > If there is any conflict between
> > > multiple
> > > > > > > > > > configuration
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > > > > explicitly specified, I think we
> > should
> > > > > throw
> > > > > > > an
> > > > > > > > > > error.
> > > > > > > > > > > > > > > > > > > > > I think doing checking on the
> client
> > > side
> > > > > is
> > > > > > a
> > > > > > > > good
> > > > > > > > > > > idea,
> > > > > > > > > > > > > so
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > > > > > > > > > K8s we can discover the problem
> > before
> > > > > > > submitting
> > > > > > > > > the
> > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > cluster,
> > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > > > > > > > > > But we can not only rely on the
> > client
> > > > side
> > > > > > > > > checking,
> > > > > > > > > > > > > because
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > standalone cluster TaskManagers on
> > > > > different
> > > > > > > > > machines
> > > > > > > > > > > may
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > configurations and the client does
> > see
> > > > > that.
> > > > > > > > > > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang
> > > Wang
> > > > <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for your detailed
> proposal.
> > > > After
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > introduced, it will be more
> > powerful
> > > to
> > > > > > > control
> > > > > > > > > the
> > > > > > > > > > > > flink
> > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > We do not differentiate user
> direct
> > > > > memory
> > > > > > > and
> > > > > > > > > > native
> > > > > > > > > > > > > > memory.
> > > > > > > > > > > > > > > > > They
> > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > > included in task off-heap memory.
> > > > Right?
> > > > > > So i
> > > > > > > > > don’t
> > > > > > > > > > > > think
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize
> > > properly. I
> > > > > > > prefer
> > > > > > > > > > > leaving
> > > > > > > > > > > > > it a
> > > > > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > If the sum of and fine-grained
> > > > > > memory(network
> > > > > > > > > > memory,
> > > > > > > > > > > > > > managed
> > > > > > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > > > > > > > > is larger than total process
> > memory,
> > > > how
> > > > > do
> > > > > > > we
> > > > > > > > > deal
> > > > > > > > > > > > with
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > situation?
> > > > > > > > > > > > > > > > > > > > > Do
> > > > > > > > > > > > > > > > > > > > > > we need to check the memory
> > > > configuration
> > > > > > in
> > > > > > > > > > client?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song <
> > [email protected]>
> > > > > > > > > 于2019年8月7日周三
> > > > > > > > > > > > > > 下午10:14写道：
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > We would like to start a
> > discussion
> > > > > > thread
> > > > > > > on
> > > > > > > > > > > > "FLIP-49:
> > > > > > > > > > > > > > > > Unified
> > > > > > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > > > > > > > > > Configuration for
> > > TaskExecutors"[1],
> > > > > > where
> > > > > > > we
> > > > > > > > > > > > describe
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> > configurations.
> > > > The
> > > > > > > FLIP
> > > > > > > > > > > document
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > based
> > > > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > > early design "Memory Management
> > and
> > > > > > > > > Configuration
> > > > > > > > > > > > > > > > Reloaded"[2]
> > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > > > > > > > > > with updates from follow-up
> > > > discussions
> > > > > > > both
> > > > > > > > > > online
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > offline.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > This FLIP addresses several
> > > > > shortcomings
> > > > > > of
> > > > > > > > > > current
> > > > > > > > > > > > > > (Flink
> > > > > > > > > > > > > > > > 1.9)
> > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> > configuration.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >    - Different configuration
> for
> > > > > > Streaming
> > > > > > > > and
> > > > > > > > > > > Batch.
> > > > > > > > > > > > > > > > > > > > > > >    - Complex and difficult
> > > > > configuration
> > > > > > of
> > > > > > > > > > RocksDB
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > Streaming.
> > > > > > > > > > > > > > > > > > > > > > >    - Complicated, uncertain and
> > > hard
> > > > to
> > > > > > > > > > understand.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Key changes to solve the
> problems
> > > can
> > > > > be
> > > > > > > > > > summarized
> > > > > > > > > > > > as
> > > > > > > > > > > > > > > > follows.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >    - Extend memory manager to
> > also
> > > > > > account
> > > > > > > > for
> > > > > > > > > > > memory
> > > > > > > > > > > > > > usage
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > > > > > > > > > >    - Modify how TaskExecutor
> > memory
> > > > is
> > > > > > > > > > partitioned
> > > > > > > > > > > > > > > accounted
> > > > > > > > > > > > > > > > > > > > individual
> > > > > > > > > > > > > > > > > > > > > > >    memory reservations and
> pools.
> > > > > > > > > > > > > > > > > > > > > > >    - Simplify memory
> > configuration
> > > > > > options
> > > > > > > > and
> > > > > > > > > > > > > > calculations
> > > > > > > > > > > > > > > > > > logics.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Please find more details in the
> > > FLIP
> > > > > wiki
> > > > > > > > > > document
> > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > (Please note that the early
> > design
> > > > doc
> > > > > > [2]
> > > > > > > is
> > > > > > > > > out
> > > > > > > > > > > of
> > > > > > > > > > > > > > sync,
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > appreciated to have the
> > discussion
> > > in
> > > > > > this
> > > > > > > > > > mailing
> > > > > > > > > > > > list
> > > > > > > > > > > > > > > > > thread.)
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Looking forward to your
> > feedbacks.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> > > > > > > > > [email protected]>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for sharing your opinion Till.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm also in favor of alternative 2. I was
> wondering
> > > > > whether
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > > avoid
> > > > > > > > > > > > > > using Unsafe.allocate() for off-heap managed
> memory
> > > and
> > > > > > > network
> > > > > > > > > > > memory
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > alternative 3. But after giving it a second
> > thought,
> > > I
> > > > > > think
> > > > > > > > even
> > > > > > > > > > for
> > > > > > > > > > > > > > alternative 3 using direct memory for off-heap
> > > managed
> > > > > > memory
> > > > > > > > > could
> > > > > > > > > > > > cause
> > > > > > > > > > > > > > problems.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Yang,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regarding your concern, I think what proposed in
> > this
> > > > > FLIP
> > > > > > it
> > > > > > > > to
> > > > > > > > > > have
> > > > > > > > > > > > > both
> > > > > > > > > > > > > > off-heap managed memory and network memory
> > allocated
> > > > > > through
> > > > > > > > > > > > > > Unsafe.allocate(), which means they are
> practically
> > > > > native
> > > > > > > > memory
> > > > > > > > > > and
> > > > > > > > > > > > not
> > > > > > > > > > > > > > limited by JVM max direct memory. The only parts
> of
> > > > > memory
> > > > > > > > > limited
> > > > > > > > > > by
> > > > > > > > > > > > JVM
> > > > > > > > > > > > > > max direct memory are task off-heap memory and
> JVM
> > > > > > overhead,
> > > > > > > > > which
> > > > > > > > > > > are
> > > > > > > > > > > > > > exactly alternative 2 suggests to set the JVM max
> > > > direct
> > > > > > > memory
> > > > > > > > > to.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the clarification Xintong. I
> > understand
> > > > the
> > > > > > two
> > > > > > > > > > > > alternatives
> > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would be in favour of option 2 because it
> makes
> > > > > things
> > > > > > > > > > explicit.
> > > > > > > > > > > If
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > don't limit the direct memory, I fear that we
> > might
> > > > end
> > > > > > up
> > > > > > > > in a
> > > > > > > > > > > > similar
> > > > > > > > > > > > > > > situation as we are currently in: The user
> might
> > > see
> > > > > that
> > > > > > > her
> > > > > > > > > > > process
> > > > > > > > > > > > > > gets
> > > > > > > > > > > > > > > killed by the OS and does not know why this is
> > the
> > > > > case.
> > > > > > > > > > > > Consequently,
> > > > > > > > > > > > > > she
> > > > > > > > > > > > > > > tries to decrease the process memory size
> > (similar
> > > to
> > > > > > > > > increasing
> > > > > > > > > > > the
> > > > > > > > > > > > > > cutoff
> > > > > > > > > > > > > > > ratio) in order to accommodate for the extra
> > direct
> > > > > > memory.
> > > > > > > > > Even
> > > > > > > > > > > > worse,
> > > > > > > > > > > > > > she
> > > > > > > > > > > > > > > tries to decrease memory budgets which are not
> > > fully
> > > > > used
> > > > > > > and
> > > > > > > > > > hence
> > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > change the overall memory consumption.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Let me explain this with a concrete example
> > Till.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Let's say we have the following scenario.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Total Process Memory: 1GB
> > > > > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM
> > > > > > Overhead):
> > > > > > > > > 200MB
> > > > > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace,
> > > > > Off-Heap
> > > > > > > > > Managed
> > > > > > > > > > > > Memory
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > Network Memory): 800MB
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > For alternative 2, we set
> > -XX:MaxDirectMemorySize
> > > > to
> > > > > > > 200MB.
> > > > > > > > > > > > > > > > For alternative 3, we set
> > -XX:MaxDirectMemorySize
> > > > to
> > > > > a
> > > > > > > very
> > > > > > > > > > large
> > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > let's say 1TB.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > > Off-Heap
> > > > > > Memory
> > > > > > > > and
> > > > > > > > > > JVM
> > > > > > > > > > > > > > > Overhead
> > > > > > > > > > > > > > > > do not exceed 200MB, then alternative 2 and
> > > > > > alternative 3
> > > > > > > > > > should
> > > > > > > > > > > > have
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > same utility. Setting larger
> > > > -XX:MaxDirectMemorySize
> > > > > > will
> > > > > > > > not
> > > > > > > > > > > > reduce
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > sizes of the other memory pools.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > > Off-Heap
> > > > > > Memory
> > > > > > > > and
> > > > > > > > > > JVM
> > > > > > > > > > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >    - Alternative 2 suffers from frequent OOM.
> > To
> > > > > avoid
> > > > > > > > that,
> > > > > > > > > > the
> > > > > > > > > > > > only
> > > > > > > > > > > > > > > thing
> > > > > > > > > > > > > > > >    user can do is to modify the configuration
> > and
> > > > > > > increase
> > > > > > > > > JVM
> > > > > > > > > > > > Direct
> > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > >    (Task Off-Heap Memory + JVM Overhead).
> Let's
> > > say
> > > > > > that
> > > > > > > > user
> > > > > > > > > > > > > increases
> > > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > > >    Direct Memory to 250MB, this will reduce
> the
> > > > total
> > > > > > > size
> > > > > > > > of
> > > > > > > > > > > other
> > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > >    pools to 750MB, given the total process
> > memory
> > > > > > remains
> > > > > > > > > 1GB.
> > > > > > > > > > > > > > > >    - For alternative 3, there is no chance of
> > > > direct
> > > > > > OOM.
> > > > > > > > > There
> > > > > > > > > > > are
> > > > > > > > > > > > > > > chances
> > > > > > > > > > > > > > > >    of exceeding the total process memory
> limit,
> > > but
> > > > > > given
> > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > >    not use up all the reserved native memory
> > > > > (Off-Heap
> > > > > > > > > Managed
> > > > > > > > > > > > > Memory,
> > > > > > > > > > > > > > > > Network
> > > > > > > > > > > > > > > >    Memory, JVM Metaspace), if the actual
> direct
> > > > > memory
> > > > > > > > usage
> > > > > > > > > is
> > > > > > > > > > > > > > slightly
> > > > > > > > > > > > > > > > above
> > > > > > > > > > > > > > > >    yet very close to 200MB, user probably do
> > not
> > > > need
> > > > > > to
> > > > > > > > > change
> > > > > > > > > > > the
> > > > > > > > > > > > > > > >    configurations.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Therefore, I think from the user's
> > perspective, a
> > > > > > > feasible
> > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > for alternative 2 may lead to lower resource
> > > > > > utilization
> > > > > > > > > > compared
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > alternative 3.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till
> Rohrmann
> > <
> > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I guess you have to help me understand the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > > > > alternative 2
> > > > > > > > > > > > > > > > > and 3 wrt to memory under utilization
> > Xintong.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize
> > to
> > > > Task
> > > > > > > > > Off-Heap
> > > > > > > > > > > > Memory
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > > > > Overhead. Then there is the risk that this
> > size
> > > > is
> > > > > > too
> > > > > > > > low
> > > > > > > > > > > > > resulting
> > > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > > lot of garbage collection and potentially
> an
> > > OOM.
> > > > > > > > > > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize
> > to
> > > > > > > something
> > > > > > > > > > larger
> > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > alternative 2. This would of course reduce
> > the
> > > > > sizes
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > other
> > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > types.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > How would alternative 2 now result in an
> > under
> > > > > > > > utilization
> > > > > > > > > of
> > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > compared to alternative 3? If alternative 3
> > > > > strictly
> > > > > > > > sets a
> > > > > > > > > > > > higher
> > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > direct memory size and we use only little,
> > > then I
> > > > > > would
> > > > > > > > > > expect
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > alternative 3 results in memory under
> > > > utilization.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi xintong,till
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Native and Direct Memory
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > My point is setting a very large max
> direct
> > > > > memory
> > > > > > > size
> > > > > > > > > > when
> > > > > > > > > > > we
> > > > > > > > > > > > > do
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > differentiate direct and native memory.
> If
> > > the
> > > > > > direct
> > > > > > > > > > > > > > > memory,including
> > > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > direct memory and framework direct
> > > memory,could
> > > > > be
> > > > > > > > > > calculated
> > > > > > > > > > > > > > > > > > correctly,then
> > > > > > > > > > > > > > > > > > i am in favor of setting direct memory
> with
> > > > fixed
> > > > > > > > value.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Memory Calculation
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I agree with xintong. For Yarn and k8s,we
> > > need
> > > > to
> > > > > > > check
> > > > > > > > > the
> > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > configurations in client to avoid
> > submitting
> > > > > > > > successfully
> > > > > > > > > > and
> > > > > > > > > > > > > > failing
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > the flink master.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Yang
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song <[email protected]
> > > > >于2019年8月13日
> > > > > > > > > 周二22:07写道：
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > About MemorySegment, I think you are
> > right
> > > > that
> > > > > > we
> > > > > > > > > should
> > > > > > > > > > > not
> > > > > > > > > > > > > > > include
> > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > issue in the scope of this FLIP. This
> > FLIP
> > > > > should
> > > > > > > > > > > concentrate
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > configure memory pools for
> TaskExecutors,
> > > > with
> > > > > > > > minimum
> > > > > > > > > > > > > > involvement
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > memory consumers use it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > About direct memory, I think
> alternative
> > 3
> > > > may
> > > > > > not
> > > > > > > > > having
> > > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > > reservation issue that alternative 2
> > does,
> > > > but
> > > > > at
> > > > > > > the
> > > > > > > > > > cost
> > > > > > > > > > > of
> > > > > > > > > > > > > > risk
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > > using memory at the container level,
> > which
> > > is
> > > > > not
> > > > > > > > good.
> > > > > > > > > > My
> > > > > > > > > > > > > point
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and "JVM
> > > > Overhead"
> > > > > > are
> > > > > > > > not
> > > > > > > > > > easy
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > config.
> > > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > > alternative 2, users might configure
> them
> > > > > higher
> > > > > > > than
> > > > > > > > > > what
> > > > > > > > > > > > > > actually
> > > > > > > > > > > > > > > > > > needed,
> > > > > > > > > > > > > > > > > > > just to avoid getting a direct OOM. For
> > > > > > alternative
> > > > > > > > 3,
> > > > > > > > > > > users
> > > > > > > > > > > > do
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > > > > direct OOM, so they may not config the
> > two
> > > > > > options
> > > > > > > > > > > > aggressively
> > > > > > > > > > > > > > > high.
> > > > > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > > > > the consequences are risks of overall
> > > > container
> > > > > > > > memory
> > > > > > > > > > > usage
> > > > > > > > > > > > > > > exceeds
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > budget.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till
> > > > Rohrmann <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for proposing this FLIP
> Xintong.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > All in all I think it already looks
> > quite
> > > > > good.
> > > > > > > > > > > Concerning
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > open
> > > > > > > > > > > > > > > > > > > > question about allocating memory
> > > segments,
> > > > I
> > > > > > was
> > > > > > > > > > > wondering
> > > > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > strictly necessary to do in the
> context
> > > of
> > > > > this
> > > > > > > > FLIP
> > > > > > > > > or
> > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > be done as a follow up? Without
> knowing
> > > all
> > > > > > > > details,
> > > > > > > > > I
> > > > > > > > > > > > would
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > concerned
> > > > > > > > > > > > > > > > > > > > that we would widen the scope of this
> > > FLIP
> > > > > too
> > > > > > > much
> > > > > > > > > > > because
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > to touch all the existing call sites
> of
> > > the
> > > > > > > > > > MemoryManager
> > > > > > > > > > > > > where
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > allocate
> > > > > > > > > > > > > > > > > > > > memory segments (this should mainly
> be
> > > > batch
> > > > > > > > > > operators).
> > > > > > > > > > > > The
> > > > > > > > > > > > > > > > addition
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > the memory reservation call to the
> > > > > > MemoryManager
> > > > > > > > > should
> > > > > > > > > > > not
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > affected
> > > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > this and I would hope that this is
> the
> > > only
> > > > > > point
> > > > > > > > of
> > > > > > > > > > > > > > interaction
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > streaming job would have with the
> > > > > > MemoryManager.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Concerning the second open question
> > about
> > > > > > setting
> > > > > > > > or
> > > > > > > > > > not
> > > > > > > > > > > > > > setting
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > > direct memory limit, I would also be
> > > > > interested
> > > > > > > why
> > > > > > > > > > Yang
> > > > > > > > > > > > Wang
> > > > > > > > > > > > > > > > thinks
> > > > > > > > > > > > > > > > > > > > leaving it open would be best. My
> > concern
> > > > > about
> > > > > > > > this
> > > > > > > > > > > would
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > be in a similar situation as we are
> now
> > > > with
> > > > > > the
> > > > > > > > > > > > > > > > RocksDBStateBackend.
> > > > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > > > the different memory pools are not
> > > clearly
> > > > > > > > separated
> > > > > > > > > > and
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > spill
> > > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > a different pool, then it is quite
> hard
> > > to
> > > > > > > > understand
> > > > > > > > > > > what
> > > > > > > > > > > > > > > exactly
> > > > > > > > > > > > > > > > > > > causes a
> > > > > > > > > > > > > > > > > > > > process to get killed for using too
> > much
> > > > > > memory.
> > > > > > > > This
> > > > > > > > > > > could
> > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > easily
> > > > > > > > > > > > > > > > > > > > lead to a similar situation what we
> > have
> > > > with
> > > > > > the
> > > > > > > > > > > > > cutoff-ratio.
> > > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > setting a sane default value for max
> > > direct
> > > > > > > memory
> > > > > > > > > and
> > > > > > > > > > > > giving
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > option to increase it if he runs into
> > an
> > > > OOM.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Xintong, how would alternative 2
> lead
> > to
> > > > > lower
> > > > > > > > > memory
> > > > > > > > > > > > > > > utilization
> > > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > > alternative 3 where we set the direct
> > > > memory
> > > > > > to a
> > > > > > > > > > higher
> > > > > > > > > > > > > value?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM
> Xintong
> > > > Song <
> > > > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > > > > > > > > > I think setting a very large max
> > direct
> > > > > > memory
> > > > > > > > size
> > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > > > good sides. E.g., we do not worry
> > about
> > > > > > direct
> > > > > > > > OOM,
> > > > > > > > > > and
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > to allocate managed / network
> memory
> > > with
> > > > > > > > > > > > > Unsafe.allocate() .
> > > > > > > > > > > > > > > > > > > > > However, there are also some down
> > sides
> > > > of
> > > > > > > doing
> > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >    - One thing I can think of is
> that
> > > if
> > > > a
> > > > > > task
> > > > > > > > > > > executor
> > > > > > > > > > > > > > > > container
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > >    killed due to overusing memory,
> it
> > > > could
> > > > > > be
> > > > > > > > hard
> > > > > > > > > > for
> > > > > > > > > > > > use
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > part
> > > > > > > > > > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > > > > > > > > > >    - Another down side is that the
> > JVM
> > > > > never
> > > > > > > > > trigger
> > > > > > > > > > GC
> > > > > > > > > > > > due
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > reaching
> > > > > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > > >    direct memory limit, because the
> > > limit
> > > > > is
> > > > > > > too
> > > > > > > > > high
> > > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > reached.
> > > > > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > > > > >    means we kind of relay on heap
> > > memory
> > > > to
> > > > > > > > trigger
> > > > > > > > > > GC
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > > >    memory. That could be a problem
> in
> > > > cases
> > > > > > > where
> > > > > > > > > we
> > > > > > > > > > > have
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > > >    usage but not enough heap
> activity
> > > to
> > > > > > > trigger
> > > > > > > > > the
> > > > > > > > > > > GC.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Maybe you can share your reasons
> for
> > > > > > preferring
> > > > > > > > > > > setting a
> > > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > > > > > > if there are anything else I
> > > overlooked.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > > > > > > > > > If there is any conflict between
> > > multiple
> > > > > > > > > > configuration
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > > > > explicitly specified, I think we
> > should
> > > > > throw
> > > > > > > an
> > > > > > > > > > error.
> > > > > > > > > > > > > > > > > > > > > I think doing checking on the
> client
> > > side
> > > > > is
> > > > > > a
> > > > > > > > good
> > > > > > > > > > > idea,
> > > > > > > > > > > > > so
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > > > > > > > > > K8s we can discover the problem
> > before
> > > > > > > submitting
> > > > > > > > > the
> > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > cluster,
> > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > > > > > > > > > But we can not only rely on the
> > client
> > > > side
> > > > > > > > > checking,
> > > > > > > > > > > > > because
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > standalone cluster TaskManagers on
> > > > > different
> > > > > > > > > machines
> > > > > > > > > > > may
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > configurations and the client does
> > see
> > > > > that.
> > > > > > > > > > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang
> > > Wang
> > > > <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for your detailed
> proposal.
> > > > After
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > introduced, it will be more
> > powerful
> > > to
> > > > > > > control
> > > > > > > > > the
> > > > > > > > > > > > flink
> > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > We do not differentiate user
> direct
> > > > > memory
> > > > > > > and
> > > > > > > > > > native
> > > > > > > > > > > > > > memory.
> > > > > > > > > > > > > > > > > They
> > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > > included in task off-heap memory.
> > > > Right?
> > > > > > So i
> > > > > > > > > don’t
> > > > > > > > > > > > think
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize
> > > properly. I
> > > > > > > prefer
> > > > > > > > > > > leaving
> > > > > > > > > > > > > it a
> > > > > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > If the sum of and fine-grained
> > > > > > memory(network
> > > > > > > > > > memory,
> > > > > > > > > > > > > > managed
> > > > > > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > > > > > > > > is larger than total process
> > memory,
> > > > how
> > > > > do
> > > > > > > we
> > > > > > > > > deal
> > > > > > > > > > > > with
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > situation?
> > > > > > > > > > > > > > > > > > > > > Do
> > > > > > > > > > > > > > > > > > > > > > we need to check the memory
> > > > configuration
> > > > > > in
> > > > > > > > > > client?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song <
> > [email protected]>
> > > > > > > > > 于2019年8月7日周三
> > > > > > > > > > > > > > 下午10:14写道：
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > We would like to start a
> > discussion
> > > > > > thread
> > > > > > > on
> > > > > > > > > > > > "FLIP-49:
> > > > > > > > > > > > > > > > Unified
> > > > > > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > > > > > > > > > Configuration for
> > > TaskExecutors"[1],
> > > > > > where
> > > > > > > we
> > > > > > > > > > > > describe
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> > configurations.
> > > > The
> > > > > > > FLIP
> > > > > > > > > > > document
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > based
> > > > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > > early design "Memory Management
> > and
> > > > > > > > > Configuration
> > > > > > > > > > > > > > > > Reloaded"[2]
> > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > > > > > > > > > with updates from follow-up
> > > > discussions
> > > > > > > both
> > > > > > > > > > online
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > offline.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > This FLIP addresses several
> > > > > shortcomings
> > > > > > of
> > > > > > > > > > current
> > > > > > > > > > > > > > (Flink
> > > > > > > > > > > > > > > > 1.9)
> > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> > configuration.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >    - Different configuration
> for
> > > > > > Streaming
> > > > > > > > and
> > > > > > > > > > > Batch.
> > > > > > > > > > > > > > > > > > > > > > >    - Complex and difficult
> > > > > configuration
> > > > > > of
> > > > > > > > > > RocksDB
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > Streaming.
> > > > > > > > > > > > > > > > > > > > > > >    - Complicated, uncertain and
> > > hard
> > > > to
> > > > > > > > > > understand.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Key changes to solve the
> problems
> > > can
> > > > > be
> > > > > > > > > > summarized
> > > > > > > > > > > > as
> > > > > > > > > > > > > > > > follows.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >    - Extend memory manager to
> > also
> > > > > > account
> > > > > > > > for
> > > > > > > > > > > memory
> > > > > > > > > > > > > > usage
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > > > > > > > > > >    - Modify how TaskExecutor
> > memory
> > > > is
> > > > > > > > > > partitioned
> > > > > > > > > > > > > > > accounted
> > > > > > > > > > > > > > > > > > > > individual
> > > > > > > > > > > > > > > > > > > > > > >    memory reservations and
> pools.
> > > > > > > > > > > > > > > > > > > > > > >    - Simplify memory
> > configuration
> > > > > > options
> > > > > > > > and
> > > > > > > > > > > > > > calculations
> > > > > > > > > > > > > > > > > > logics.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Please find more details in the
> > > FLIP
> > > > > wiki
> > > > > > > > > > document
> > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > (Please note that the early
> > design
> > > > doc
> > > > > > [2]
> > > > > > > is
> > > > > > > > > out
> > > > > > > > > > > of
> > > > > > > > > > > > > > sync,
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > appreciated to have the
> > discussion
> > > in
> > > > > > this
> > > > > > > > > > mailing
> > > > > > > > > > > > list
> > > > > > > > > > > > > > > > > thread.)
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Looking forward to your
> > feedbacks.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Reply via email to