Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Till Rohrmann Thu, 29 Aug 2019 03:15:56 -0700

My understanding was that before starting the Flink process we call a
utility which calculates these values. I assume that this utility will do
the calculation based on a set of configured values (process memory, flink
memory, network memory etc.). Assuming that these values don't differ from
the values with which the JVM is started, it should be possible to
recompute them in the Flink process in order to set the values.




On Thu, Aug 29, 2019 at 11:29 AM Stephan Ewen <[email protected]> wrote:

> When computing the values in the JVM process after it started, how would
> you deal with values like Max Direct Memory, Metaspace size. native memory
> reservation (reduce heap size), etc? All the values that are parameters to
> the JVM process and that need to be supplied at process startup?
>
> On Wed, Aug 28, 2019 at 4:46 PM Till Rohrmann <[email protected]>
> wrote:
>
> > Thanks for the clarification. I have some more comments:
> >
> > - I would actually split the logic to compute the process memory
> > requirements and storing the values into two things. E.g. one could name
> > the former TaskExecutorProcessUtility and  the latter
> > TaskExecutorProcessMemory. But we can discuss this on the PR since it's
> > just a naming detail.
> >
> > - Generally, I'm not opposed to making configuration values overridable
> by
> > ENV variables. I think this is a very good idea and makes the
> > configurability of Flink processes easier. However, I think that adding
> > this functionality should not be part of this FLIP because it would
> simply
> > widen the scope unnecessarily.
> >
> > The reasons why I believe it is unnecessary are the following: For Yarn
> we
> > already create write a flink-conf.yaml which could be populated with the
> > memory settings. For the other processes it should not make a difference
> > whether the loaded Configuration is populated with the memory settings
> from
> > ENV variables or by using TaskExecutorProcessUtility to compute the
> missing
> > values from the loaded configuration. If the latter would not be possible
> > (wrong or missing configuration values), then we should not have been
> able
> > to actually start the process in the first place.
> >
> > - Concerning the memory reservation: I agree with you that we need the
> > memory reservation functionality to make streaming jobs work with
> "managed"
> > memory. However, w/o this functionality the whole Flip would already
> bring
> > a good amount of improvements to our users when running batch jobs.
> > Moreover, by keeping the scope smaller we can complete the FLIP faster.
> > Hence, I would propose to address the memory reservation functionality
> as a
> > follow up FLIP (which Yu is working on if I'm not mistaken).
> >
> > Cheers,
> > Till
> >
> > On Wed, Aug 28, 2019 at 11:43 AM Yang Wang <[email protected]>
> wrote:
> >
> > > Just add my 2 cents.
> > >
> > > Using environment variables to override the configuration for different
> > > taskmanagers is better.
> > > We do not need to generate dedicated flink-conf.yaml for all
> > taskmanagers.
> > > A common flink-conf.yam and different environment variables are enough.
> > > By reducing the distributed cached files, it could make launching a
> > > taskmanager faster.
> > >
> > > Stephan gives a good suggestion that we could move the logic into
> > > "GlobalConfiguration.loadConfig()" method.
> > > Maybe the client could also benefit from this. Different users do not
> > have
> > > to export FLINK_CONF_DIR to update few config options.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Stephan Ewen <[email protected]> 于2019年8月28日周三 上午1:21写道：
> > >
> > > > One note on the Environment Variables and Configuration discussion.
> > > >
> > > > My understanding is that passed ENV variables are added to the
> > > > configuration in the "GlobalConfiguration.loadConfig()" method (or
> > > > similar).
> > > > For all the code inside Flink, it looks like the data was in the
> config
> > > to
> > > > start with, just that the scripts that compute the variables can pass
> > the
> > > > values to the process without actually needing to write a file.
> > > >
> > > > For example the "GlobalConfiguration.loadConfig()" method would take
> > any
> > > > ENV variable prefixed with "flink" and add it as a config key.
> > > > "flink_taskmanager_memory_size=2g" would become
> > "taskmanager.memory.size:
> > > > 2g".
> > > >
> > > >
> > > > On Tue, Aug 27, 2019 at 4:05 PM Xintong Song <[email protected]>
> > > > wrote:
> > > >
> > > > > Thanks for the comments, Till.
> > > > >
> > > > > I've also seen your comments on the wiki page, but let's keep the
> > > > > discussion here.
> > > > >
> > > > > - Regarding 'TaskExecutorSpecifics', how do you think about naming
> it
> > > > > 'TaskExecutorResourceSpecifics'.
> > > > > - Regarding passing memory configurations into task executors, I'm
> in
> > > > favor
> > > > > of do it via environment variables rather than configurations, with
> > the
> > > > > following two reasons.
> > > > >   - It is easier to keep the memory options once calculate not to
> be
> > > > > changed with environment variables rather than configurations.
> > > > >   - I'm not sure whether we should write the configuration in
> startup
> > > > > scripts. Writing changes into the configuration files when running
> > the
> > > > > startup scripts does not sounds right to me. Or we could make a
> copy
> > of
> > > > > configuration files per flink cluster, and make the task executor
> to
> > > load
> > > > > from the copy, and clean up the copy after the cluster is shutdown,
> > > which
> > > > > is complicated. (I think this is also what Stephan means in his
> > comment
> > > > on
> > > > > the wiki page?)
> > > > > - Regarding reserving memory, I think this change should be
> included
> > in
> > > > > this FLIP. I think a big part of motivations of this FLIP is to
> unify
> > > > > memory configuration for streaming / batch and make it easy for
> > > > configuring
> > > > > rocksdb memory. If we don't support memory reservation, then
> > streaming
> > > > jobs
> > > > > cannot use managed memory (neither on-heap or off-heap), which
> makes
> > > this
> > > > > FLIP incomplete.
> > > > > - Regarding network memory, I think you are right. I think we
> > probably
> > > > > don't need to change network stack from using direct memory to
> using
> > > > unsafe
> > > > > native memory. Network memory size is deterministic, cannot be
> > reserved
> > > > as
> > > > > managed memory does, and cannot be overused. I think it also works
> if
> > > we
> > > > > simply keep using direct memory for network and include it in jvm
> max
> > > > > direct memory size.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Aug 27, 2019 at 8:12 PM Till Rohrmann <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Xintong,
> > > > > >
> > > > > > thanks for addressing the comments and adding a more detailed
> > > > > > implementation plan. I have a couple of comments concerning the
> > > > > > implementation plan:
> > > > > >
> > > > > > - The name `TaskExecutorSpecifics` is not really descriptive.
> > > Choosing
> > > > a
> > > > > > different name could help here.
> > > > > > - I'm not sure whether I would pass the memory configuration to
> the
> > > > > > TaskExecutor via environment variables. I think it would be
> better
> > to
> > > > > write
> > > > > > it into the configuration one uses to start the TM process.
> > > > > > - If possible, I would exclude the memory reservation from this
> > FLIP
> > > > and
> > > > > > add this as part of a dedicated FLIP.
> > > > > > - If possible, then I would exclude changes to the network stack
> > from
> > > > > this
> > > > > > FLIP. Maybe we can simply say that the direct memory needed by
> the
> > > > > network
> > > > > > stack is the framework direct memory requirement. Changing how
> the
> > > > memory
> > > > > > is allocated can happen in a second step. This would keep the
> scope
> > > of
> > > > > this
> > > > > > FLIP smaller.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 2:51 PM Xintong Song <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I just updated the FLIP document on wiki [1], with the
> following
> > > > > changes.
> > > > > > >
> > > > > > >    - Removed open question regarding MemorySegment allocation.
> As
> > > > > > >    discussed, we exclude this topic from the scope of this
> FLIP.
> > > > > > >    - Updated content about JVM direct memory parameter
> according
> > to
> > > > > > recent
> > > > > > >    discussions, and moved the other options to "Rejected
> > > > Alternatives"
> > > > > > for
> > > > > > > the
> > > > > > >    moment.
> > > > > > >    - Added implementation steps.
> > > > > > >
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > >
> > > > > > > On Mon, Aug 19, 2019 at 7:16 PM Stephan Ewen <[email protected]
> >
> > > > wrote:
> > > > > > >
> > > > > > > > @Xintong: Concerning "wait for memory users before task
> dispose
> > > and
> > > > > > > memory
> > > > > > > > release": I agree, that's how it should be. Let's try it out.
> > > > > > > >
> > > > > > > > @Xintong @Jingsong: Concerning " JVM does not wait for GC
> when
> > > > > > allocating
> > > > > > > > direct memory buffer": There seems to be pretty elaborate
> logic
> > > to
> > > > > free
> > > > > > > > buffers when allocating new ones. See
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/jdk/file/tip/src/share/classes/java/nio/Bits.java#l643
> > > > > > > >
> > > > > > > > @Till: Maybe. If we assume that the JVM default works (like
> > going
> > > > > with
> > > > > > > > option 2 and not setting "-XX:MaxDirectMemorySize" at all),
> > then
> > > I
> > > > > > think
> > > > > > > it
> > > > > > > > should be okay to set "-XX:MaxDirectMemorySize" to
> > > > > > > > "off_heap_managed_memory + direct_memory" even if we use
> > RocksDB.
> > > > > That
> > > > > > > is a
> > > > > > > > big if, though, I honestly have no idea :D Would be good to
> > > > > understand
> > > > > > > > this, though, because this would affect option (2) and option
> > > > (1.2).
> > > > > > > >
> > > > > > > > On Mon, Aug 19, 2019 at 4:44 PM Xintong Song <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the inputs, Jingsong.
> > > > > > > > >
> > > > > > > > > Let me try to summarize your points. Please correct me if
> I'm
> > > > > wrong.
> > > > > > > > >
> > > > > > > > >    - Memory consumers should always avoid returning memory
> > > > segments
> > > > > > to
> > > > > > > > >    memory manager while there are still un-cleaned
> > structures /
> > > > > > threads
> > > > > > > > > that
> > > > > > > > >    may use the memory. Otherwise, it would cause serious
> > > problems
> > > > > by
> > > > > > > > having
> > > > > > > > >    multiple consumers trying to use the same memory
> segment.
> > > > > > > > >    - JVM does not wait for GC when allocating direct memory
> > > > buffer.
> > > > > > > > >    Therefore even we set proper max direct memory size
> limit,
> > > we
> > > > > may
> > > > > > > > still
> > > > > > > > >    encounter direct memory oom if the GC cleaning memory
> > slower
> > > > > than
> > > > > > > the
> > > > > > > > >    direct memory allocation.
> > > > > > > > >
> > > > > > > > > Am I understanding this correctly?
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Aug 19, 2019 at 4:21 PM JingsongLee <
> > > > > [email protected]
> > > > > > > > > .invalid>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi stephan:
> > > > > > > > > >
> > > > > > > > > > About option 2:
> > > > > > > > > >
> > > > > > > > > > if additional threads not cleanly shut down before we can
> > > exit
> > > > > the
> > > > > > > > task:
> > > > > > > > > > In the current case of memory reuse, it has freed up the
> > > memory
> > > > > it
> > > > > > > > > >  uses. If this memory is used by other tasks and
> > asynchronous
> > > > > > threads
> > > > > > > > > >  of exited task may still be writing, there will be
> > > concurrent
> > > > > > > security
> > > > > > > > > >  problems, and even lead to errors in user computing
> > results.
> > > > > > > > > >
> > > > > > > > > > So I think this is a serious and intolerable bug, No
> matter
> > > > what
> > > > > > the
> > > > > > > > > >  option is, it should be avoided.
> > > > > > > > > >
> > > > > > > > > > About direct memory cleaned by GC:
> > > > > > > > > > I don't think it is a good idea, I've encountered so many
> > > > > > situations
> > > > > > > > > >  that it's too late for GC to cause DirectMemory OOM.
> > Release
> > > > and
> > > > > > > > > >  allocate DirectMemory depend on the type of user job,
> > which
> > > is
> > > > > > > > > >  often beyond our control.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Jingsong Lee
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > ------------------------------------------------------------------
> > > > > > > > > > From:Stephan Ewen <[email protected]>
> > > > > > > > > > Send Time:2019年8月19日(星期一) 15:56
> > > > > > > > > > To:dev <[email protected]>
> > > > > > > > > > Subject:Re: [DISCUSS] FLIP-49: Unified Memory
> Configuration
> > > for
> > > > > > > > > > TaskExecutors
> > > > > > > > > >
> > > > > > > > > > My main concern with option 2 (manually release memory)
> is
> > > that
> > > > > > > > segfaults
> > > > > > > > > > in the JVM send off all sorts of alarms on user ends. So
> we
> > > > need
> > > > > to
> > > > > > > > > > guarantee that this never happens.
> > > > > > > > > >
> > > > > > > > > > The trickyness is in tasks that uses data structures /
> > > > algorithms
> > > > > > > with
> > > > > > > > > > additional threads, like hash table spill/read and
> sorting
> > > > > threads.
> > > > > > > We
> > > > > > > > > need
> > > > > > > > > > to ensure that these cleanly shut down before we can exit
> > the
> > > > > task.
> > > > > > > > > > I am not sure that we have that guaranteed already,
> that's
> > > why
> > > > > > option
> > > > > > > > 1.1
> > > > > > > > > > seemed simpler to me.
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 19, 2019 at 3:42 PM Xintong Song <
> > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the comments, Stephan. Summarized in this
> way
> > > > really
> > > > > > > makes
> > > > > > > > > > > things easier to understand.
> > > > > > > > > > >
> > > > > > > > > > > I'm in favor of option 2, at least for the moment. I
> > think
> > > it
> > > > > is
> > > > > > > not
> > > > > > > > > that
> > > > > > > > > > > difficult to keep it segfault safe for memory manager,
> as
> > > > long
> > > > > as
> > > > > > > we
> > > > > > > > > > always
> > > > > > > > > > > de-allocate the memory segment when it is released from
> > the
> > > > > > memory
> > > > > > > > > > > consumers. Only if the memory consumer continue using
> the
> > > > > buffer
> > > > > > of
> > > > > > > > > > memory
> > > > > > > > > > > segment after releasing it, in which case we do want
> the
> > > job
> > > > to
> > > > > > > fail
> > > > > > > > so
> > > > > > > > > > we
> > > > > > > > > > > detect the memory leak early.
> > > > > > > > > > >
> > > > > > > > > > > For option 1.2, I don't think this is a good idea. Not
> > only
> > > > > > because
> > > > > > > > the
> > > > > > > > > > > assumption (regular GC is enough to clean direct
> buffers)
> > > may
> > > > > not
> > > > > > > > > always
> > > > > > > > > > be
> > > > > > > > > > > true, but also it makes harder for finding problems in
> > > cases
> > > > of
> > > > > > > > memory
> > > > > > > > > > > overuse. E.g., user configured some direct memory for
> the
> > > > user
> > > > > > > > > libraries.
> > > > > > > > > > > If the library actually use more direct memory then
> > > > configured,
> > > > > > > which
> > > > > > > > > > > cannot be cleaned by GC because they are still in use,
> > may
> > > > lead
> > > > > > to
> > > > > > > > > > overuse
> > > > > > > > > > > of the total container memory. In that case, if it
> didn't
> > > > touch
> > > > > > the
> > > > > > > > JVM
> > > > > > > > > > > default max direct memory limit, we cannot get a direct
> > > > memory
> > > > > > OOM
> > > > > > > > and
> > > > > > > > > it
> > > > > > > > > > > will become super hard to understand which part of the
> > > > > > > configuration
> > > > > > > > > need
> > > > > > > > > > > to be updated.
> > > > > > > > > > >
> > > > > > > > > > > For option 1.1, it has the similar problem as 1.2, if
> the
> > > > > > exceeded
> > > > > > > > > direct
> > > > > > > > > > > memory does not reach the max direct memory limit
> > specified
> > > > by
> > > > > > the
> > > > > > > > > > > dedicated parameter. I think it is slightly better than
> > > 1.2,
> > > > > only
> > > > > > > > > because
> > > > > > > > > > > we can tune the parameter.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 19, 2019 at 2:53 PM Stephan Ewen <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > About the "-XX:MaxDirectMemorySize" discussion, maybe
> > let
> > > > me
> > > > > > > > > summarize
> > > > > > > > > > > it a
> > > > > > > > > > > > bit differently:
> > > > > > > > > > > >
> > > > > > > > > > > > We have the following two options:
> > > > > > > > > > > >
> > > > > > > > > > > > (1) We let MemorySegments be de-allocated by the GC.
> > That
> > > > > makes
> > > > > > > it
> > > > > > > > > > > segfault
> > > > > > > > > > > > safe. But then we need a way to trigger GC in case
> > > > > > de-allocation
> > > > > > > > and
> > > > > > > > > > > > re-allocation of a bunch of segments happens quickly,
> > > which
> > > > > is
> > > > > > > > often
> > > > > > > > > > the
> > > > > > > > > > > > case during batch scheduling or task restart.
> > > > > > > > > > > >   - The "-XX:MaxDirectMemorySize" (option 1.1) is one
> > way
> > > > to
> > > > > do
> > > > > > > > this
> > > > > > > > > > > >   - Another way could be to have a dedicated
> > bookkeeping
> > > in
> > > > > the
> > > > > > > > > > > > MemoryManager (option 1.2), so that this is a number
> > > > > > independent
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > "-XX:MaxDirectMemorySize" parameter.
> > > > > > > > > > > >
> > > > > > > > > > > > (2) We manually allocate and de-allocate the memory
> for
> > > the
> > > > > > > > > > > MemorySegments
> > > > > > > > > > > > (option 2). That way we need not worry about
> triggering
> > > GC
> > > > by
> > > > > > > some
> > > > > > > > > > > > threshold or bookkeeping, but it is harder to prevent
> > > > > > segfaults.
> > > > > > > We
> > > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > > be very careful about when we release the memory
> > segments
> > > > > (only
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > cleanup phase of the main thread).
> > > > > > > > > > > >
> > > > > > > > > > > > If we go with option 1.1, we probably need to set
> > > > > > > > > > > > "-XX:MaxDirectMemorySize" to
> "off_heap_managed_memory +
> > > > > > > > > direct_memory"
> > > > > > > > > > > and
> > > > > > > > > > > > have "direct_memory" as a separate reserved memory
> > pool.
> > > > > > Because
> > > > > > > if
> > > > > > > > > we
> > > > > > > > > > > just
> > > > > > > > > > > > set "-XX:MaxDirectMemorySize" to
> > > "off_heap_managed_memory +
> > > > > > > > > > > jvm_overhead",
> > > > > > > > > > > > then there will be times when that entire memory is
> > > > allocated
> > > > > > by
> > > > > > > > > direct
> > > > > > > > > > > > buffers and we have nothing left for the JVM
> overhead.
> > So
> > > > we
> > > > > > > either
> > > > > > > > > > need
> > > > > > > > > > > a
> > > > > > > > > > > > way to compensate for that (again some safety margin
> > > cutoff
> > > > > > > value)
> > > > > > > > or
> > > > > > > > > > we
> > > > > > > > > > > > will exceed container memory.
> > > > > > > > > > > >
> > > > > > > > > > > > If we go with option 1.2, we need to be aware that it
> > > takes
> > > > > > > > elaborate
> > > > > > > > > > > logic
> > > > > > > > > > > > to push recycling of direct buffers without always
> > > > > triggering a
> > > > > > > > full
> > > > > > > > > > GC.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > My first guess is that the options will be easiest to
> > do
> > > in
> > > > > the
> > > > > > > > > > following
> > > > > > > > > > > > order:
> > > > > > > > > > > >
> > > > > > > > > > > >   - Option 1.1 with a dedicated direct_memory
> > parameter,
> > > as
> > > > > > > > discussed
> > > > > > > > > > > > above. We would need to find a way to set the
> > > direct_memory
> > > > > > > > parameter
> > > > > > > > > > by
> > > > > > > > > > > > default. We could start with 64 MB and see how it
> goes
> > in
> > > > > > > practice.
> > > > > > > > > One
> > > > > > > > > > > > danger I see is that setting this loo low can cause a
> > > bunch
> > > > > of
> > > > > > > > > > additional
> > > > > > > > > > > > GCs compared to before (we need to watch this
> > carefully).
> > > > > > > > > > > >
> > > > > > > > > > > >   - Option 2. It is actually quite simple to
> implement,
> > > we
> > > > > > could
> > > > > > > > try
> > > > > > > > > > how
> > > > > > > > > > > > segfault safe we are at the moment.
> > > > > > > > > > > >
> > > > > > > > > > > >   - Option 1.2: We would not touch the
> > > > > > "-XX:MaxDirectMemorySize"
> > > > > > > > > > > parameter
> > > > > > > > > > > > at all and assume that all the direct memory
> > allocations
> > > > that
> > > > > > the
> > > > > > > > JVM
> > > > > > > > > > and
> > > > > > > > > > > > Netty do are infrequent enough to be cleaned up fast
> > > enough
> > > > > > > through
> > > > > > > > > > > regular
> > > > > > > > > > > > GC. I am not sure if that is a valid assumption,
> > though.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stephan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> > > > > > > > [email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for sharing your opinion Till.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm also in favor of alternative 2. I was wondering
> > > > whether
> > > > > > we
> > > > > > > > can
> > > > > > > > > > > avoid
> > > > > > > > > > > > > using Unsafe.allocate() for off-heap managed memory
> > and
> > > > > > network
> > > > > > > > > > memory
> > > > > > > > > > > > with
> > > > > > > > > > > > > alternative 3. But after giving it a second
> thought,
> > I
> > > > > think
> > > > > > > even
> > > > > > > > > for
> > > > > > > > > > > > > alternative 3 using direct memory for off-heap
> > managed
> > > > > memory
> > > > > > > > could
> > > > > > > > > > > cause
> > > > > > > > > > > > > problems.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Yang,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding your concern, I think what proposed in
> this
> > > > FLIP
> > > > > it
> > > > > > > to
> > > > > > > > > have
> > > > > > > > > > > > both
> > > > > > > > > > > > > off-heap managed memory and network memory
> allocated
> > > > > through
> > > > > > > > > > > > > Unsafe.allocate(), which means they are practically
> > > > native
> > > > > > > memory
> > > > > > > > > and
> > > > > > > > > > > not
> > > > > > > > > > > > > limited by JVM max direct memory. The only parts of
> > > > memory
> > > > > > > > limited
> > > > > > > > > by
> > > > > > > > > > > JVM
> > > > > > > > > > > > > max direct memory are task off-heap memory and JVM
> > > > > overhead,
> > > > > > > > which
> > > > > > > > > > are
> > > > > > > > > > > > > exactly alternative 2 suggests to set the JVM max
> > > direct
> > > > > > memory
> > > > > > > > to.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > > > > > > > > [email protected]>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the clarification Xintong. I
> understand
> > > the
> > > > > two
> > > > > > > > > > > alternatives
> > > > > > > > > > > > > > now.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would be in favour of option 2 because it makes
> > > > things
> > > > > > > > > explicit.
> > > > > > > > > > If
> > > > > > > > > > > > we
> > > > > > > > > > > > > > don't limit the direct memory, I fear that we
> might
> > > end
> > > > > up
> > > > > > > in a
> > > > > > > > > > > similar
> > > > > > > > > > > > > > situation as we are currently in: The user might
> > see
> > > > that
> > > > > > her
> > > > > > > > > > process
> > > > > > > > > > > > > gets
> > > > > > > > > > > > > > killed by the OS and does not know why this is
> the
> > > > case.
> > > > > > > > > > > Consequently,
> > > > > > > > > > > > > she
> > > > > > > > > > > > > > tries to decrease the process memory size
> (similar
> > to
> > > > > > > > increasing
> > > > > > > > > > the
> > > > > > > > > > > > > cutoff
> > > > > > > > > > > > > > ratio) in order to accommodate for the extra
> direct
> > > > > memory.
> > > > > > > > Even
> > > > > > > > > > > worse,
> > > > > > > > > > > > > she
> > > > > > > > > > > > > > tries to decrease memory budgets which are not
> > fully
> > > > used
> > > > > > and
> > > > > > > > > hence
> > > > > > > > > > > > won't
> > > > > > > > > > > > > > change the overall memory consumption.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > > > > > > > > [email protected]
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let me explain this with a concrete example
> Till.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let's say we have the following scenario.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Total Process Memory: 1GB
> > > > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM
> > > > > Overhead):
> > > > > > > > 200MB
> > > > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace,
> > > > Off-Heap
> > > > > > > > Managed
> > > > > > > > > > > Memory
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > Network Memory): 800MB
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For alternative 2, we set
> -XX:MaxDirectMemorySize
> > > to
> > > > > > 200MB.
> > > > > > > > > > > > > > > For alternative 3, we set
> -XX:MaxDirectMemorySize
> > > to
> > > > a
> > > > > > very
> > > > > > > > > large
> > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > let's say 1TB.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > Off-Heap
> > > > > Memory
> > > > > > > and
> > > > > > > > > JVM
> > > > > > > > > > > > > > Overhead
> > > > > > > > > > > > > > > do not exceed 200MB, then alternative 2 and
> > > > > alternative 3
> > > > > > > > > should
> > > > > > > > > > > have
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > same utility. Setting larger
> > > -XX:MaxDirectMemorySize
> > > > > will
> > > > > > > not
> > > > > > > > > > > reduce
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > sizes of the other memory pools.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > Off-Heap
> > > > > Memory
> > > > > > > and
> > > > > > > > > JVM
> > > > > > > > > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Alternative 2 suffers from frequent OOM.
> To
> > > > avoid
> > > > > > > that,
> > > > > > > > > the
> > > > > > > > > > > only
> > > > > > > > > > > > > > thing
> > > > > > > > > > > > > > >    user can do is to modify the configuration
> and
> > > > > > increase
> > > > > > > > JVM
> > > > > > > > > > > Direct
> > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > >    (Task Off-Heap Memory + JVM Overhead). Let's
> > say
> > > > > that
> > > > > > > user
> > > > > > > > > > > > increases
> > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > >    Direct Memory to 250MB, this will reduce the
> > > total
> > > > > > size
> > > > > > > of
> > > > > > > > > > other
> > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > >    pools to 750MB, given the total process
> memory
> > > > > remains
> > > > > > > > 1GB.
> > > > > > > > > > > > > > >    - For alternative 3, there is no chance of
> > > direct
> > > > > OOM.
> > > > > > > > There
> > > > > > > > > > are
> > > > > > > > > > > > > > chances
> > > > > > > > > > > > > > >    of exceeding the total process memory limit,
> > but
> > > > > given
> > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > >    not use up all the reserved native memory
> > > > (Off-Heap
> > > > > > > > Managed
> > > > > > > > > > > > Memory,
> > > > > > > > > > > > > > > Network
> > > > > > > > > > > > > > >    Memory, JVM Metaspace), if the actual direct
> > > > memory
> > > > > > > usage
> > > > > > > > is
> > > > > > > > > > > > > slightly
> > > > > > > > > > > > > > > above
> > > > > > > > > > > > > > >    yet very close to 200MB, user probably do
> not
> > > need
> > > > > to
> > > > > > > > change
> > > > > > > > > > the
> > > > > > > > > > > > > > >    configurations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Therefore, I think from the user's
> perspective, a
> > > > > > feasible
> > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > for alternative 2 may lead to lower resource
> > > > > utilization
> > > > > > > > > compared
> > > > > > > > > > > to
> > > > > > > > > > > > > > > alternative 3.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till Rohrmann
> <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I guess you have to help me understand the
> > > > difference
> > > > > > > > between
> > > > > > > > > > > > > > > alternative 2
> > > > > > > > > > > > > > > > and 3 wrt to memory under utilization
> Xintong.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize
> to
> > > Task
> > > > > > > > Off-Heap
> > > > > > > > > > > Memory
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > > > Overhead. Then there is the risk that this
> size
> > > is
> > > > > too
> > > > > > > low
> > > > > > > > > > > > resulting
> > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > lot of garbage collection and potentially an
> > OOM.
> > > > > > > > > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize
> to
> > > > > > something
> > > > > > > > > larger
> > > > > > > > > > > > than
> > > > > > > > > > > > > > > > alternative 2. This would of course reduce
> the
> > > > sizes
> > > > > of
> > > > > > > the
> > > > > > > > > > other
> > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > types.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > How would alternative 2 now result in an
> under
> > > > > > > utilization
> > > > > > > > of
> > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > compared to alternative 3? If alternative 3
> > > > strictly
> > > > > > > sets a
> > > > > > > > > > > higher
> > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > direct memory size and we use only little,
> > then I
> > > > > would
> > > > > > > > > expect
> > > > > > > > > > > that
> > > > > > > > > > > > > > > > alternative 3 results in memory under
> > > utilization.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > > > > > > > > [email protected]
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi xintong,till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Native and Direct Memory
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > My point is setting a very large max direct
> > > > memory
> > > > > > size
> > > > > > > > > when
> > > > > > > > > > we
> > > > > > > > > > > > do
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > differentiate direct and native memory. If
> > the
> > > > > direct
> > > > > > > > > > > > > > memory,including
> > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > direct memory and framework direct
> > memory,could
> > > > be
> > > > > > > > > calculated
> > > > > > > > > > > > > > > > > correctly,then
> > > > > > > > > > > > > > > > > i am in favor of setting direct memory with
> > > fixed
> > > > > > > value.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Memory Calculation
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I agree with xintong. For Yarn and k8s,we
> > need
> > > to
> > > > > > check
> > > > > > > > the
> > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > configurations in client to avoid
> submitting
> > > > > > > successfully
> > > > > > > > > and
> > > > > > > > > > > > > failing
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > the flink master.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yang
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song <[email protected]
> > > >于2019年8月13日
> > > > > > > > 周二22:07写道：
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > About MemorySegment, I think you are
> right
> > > that
> > > > > we
> > > > > > > > should
> > > > > > > > > > not
> > > > > > > > > > > > > > include
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > issue in the scope of this FLIP. This
> FLIP
> > > > should
> > > > > > > > > > concentrate
> > > > > > > > > > > > on
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > configure memory pools for TaskExecutors,
> > > with
> > > > > > > minimum
> > > > > > > > > > > > > involvement
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > memory consumers use it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > About direct memory, I think alternative
> 3
> > > may
> > > > > not
> > > > > > > > having
> > > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > reservation issue that alternative 2
> does,
> > > but
> > > > at
> > > > > > the
> > > > > > > > > cost
> > > > > > > > > > of
> > > > > > > > > > > > > risk
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > using memory at the container level,
> which
> > is
> > > > not
> > > > > > > good.
> > > > > > > > > My
> > > > > > > > > > > > point
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and "JVM
> > > Overhead"
> > > > > are
> > > > > > > not
> > > > > > > > > easy
> > > > > > > > > > > to
> > > > > > > > > > > > > > > config.
> > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > alternative 2, users might configure them
> > > > higher
> > > > > > than
> > > > > > > > > what
> > > > > > > > > > > > > actually
> > > > > > > > > > > > > > > > > needed,
> > > > > > > > > > > > > > > > > > just to avoid getting a direct OOM. For
> > > > > alternative
> > > > > > > 3,
> > > > > > > > > > users
> > > > > > > > > > > do
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > > > direct OOM, so they may not config the
> two
> > > > > options
> > > > > > > > > > > aggressively
> > > > > > > > > > > > > > high.
> > > > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > > > the consequences are risks of overall
> > > container
> > > > > > > memory
> > > > > > > > > > usage
> > > > > > > > > > > > > > exceeds
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > budget.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till
> > > Rohrmann <
> > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for proposing this FLIP Xintong.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > All in all I think it already looks
> quite
> > > > good.
> > > > > > > > > > Concerning
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > open
> > > > > > > > > > > > > > > > > > > question about allocating memory
> > segments,
> > > I
> > > > > was
> > > > > > > > > > wondering
> > > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > strictly necessary to do in the context
> > of
> > > > this
> > > > > > > FLIP
> > > > > > > > or
> > > > > > > > > > > > whether
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > be done as a follow up? Without knowing
> > all
> > > > > > > details,
> > > > > > > > I
> > > > > > > > > > > would
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > concerned
> > > > > > > > > > > > > > > > > > > that we would widen the scope of this
> > FLIP
> > > > too
> > > > > > much
> > > > > > > > > > because
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > to touch all the existing call sites of
> > the
> > > > > > > > > MemoryManager
> > > > > > > > > > > > where
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > allocate
> > > > > > > > > > > > > > > > > > > memory segments (this should mainly be
> > > batch
> > > > > > > > > operators).
> > > > > > > > > > > The
> > > > > > > > > > > > > > > addition
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > the memory reservation call to the
> > > > > MemoryManager
> > > > > > > > should
> > > > > > > > > > not
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > affected
> > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > this and I would hope that this is the
> > only
> > > > > point
> > > > > > > of
> > > > > > > > > > > > > interaction
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > streaming job would have with the
> > > > > MemoryManager.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Concerning the second open question
> about
> > > > > setting
> > > > > > > or
> > > > > > > > > not
> > > > > > > > > > > > > setting
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > direct memory limit, I would also be
> > > > interested
> > > > > > why
> > > > > > > > > Yang
> > > > > > > > > > > Wang
> > > > > > > > > > > > > > > thinks
> > > > > > > > > > > > > > > > > > > leaving it open would be best. My
> concern
> > > > about
> > > > > > > this
> > > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > be in a similar situation as we are now
> > > with
> > > > > the
> > > > > > > > > > > > > > > RocksDBStateBackend.
> > > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > > the different memory pools are not
> > clearly
> > > > > > > separated
> > > > > > > > > and
> > > > > > > > > > > can
> > > > > > > > > > > > > > spill
> > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > a different pool, then it is quite hard
> > to
> > > > > > > understand
> > > > > > > > > > what
> > > > > > > > > > > > > > exactly
> > > > > > > > > > > > > > > > > > causes a
> > > > > > > > > > > > > > > > > > > process to get killed for using too
> much
> > > > > memory.
> > > > > > > This
> > > > > > > > > > could
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > easily
> > > > > > > > > > > > > > > > > > > lead to a similar situation what we
> have
> > > with
> > > > > the
> > > > > > > > > > > > cutoff-ratio.
> > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > setting a sane default value for max
> > direct
> > > > > > memory
> > > > > > > > and
> > > > > > > > > > > giving
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > option to increase it if he runs into
> an
> > > OOM.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Xintong, how would alternative 2 lead
> to
> > > > lower
> > > > > > > > memory
> > > > > > > > > > > > > > utilization
> > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > alternative 3 where we set the direct
> > > memory
> > > > > to a
> > > > > > > > > higher
> > > > > > > > > > > > value?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong
> > > Song <
> > > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > > > > > > > > I think setting a very large max
> direct
> > > > > memory
> > > > > > > size
> > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > > good sides. E.g., we do not worry
> about
> > > > > direct
> > > > > > > OOM,
> > > > > > > > > and
> > > > > > > > > > > we
> > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > to allocate managed / network memory
> > with
> > > > > > > > > > > > Unsafe.allocate() .
> > > > > > > > > > > > > > > > > > > > However, there are also some down
> sides
> > > of
> > > > > > doing
> > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - One thing I can think of is that
> > if
> > > a
> > > > > task
> > > > > > > > > > executor
> > > > > > > > > > > > > > > container
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > >    killed due to overusing memory, it
> > > could
> > > > > be
> > > > > > > hard
> > > > > > > > > for
> > > > > > > > > > > use
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > part
> > > > > > > > > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > > > > > > > > >    - Another down side is that the
> JVM
> > > > never
> > > > > > > > trigger
> > > > > > > > > GC
> > > > > > > > > > > due
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > reaching
> > > > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > >    direct memory limit, because the
> > limit
> > > > is
> > > > > > too
> > > > > > > > high
> > > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > reached.
> > > > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > > > >    means we kind of relay on heap
> > memory
> > > to
> > > > > > > trigger
> > > > > > > > > GC
> > > > > > > > > > > and
> > > > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > >    memory. That could be a problem in
> > > cases
> > > > > > where
> > > > > > > > we
> > > > > > > > > > have
> > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > >    usage but not enough heap activity
> > to
> > > > > > trigger
> > > > > > > > the
> > > > > > > > > > GC.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Maybe you can share your reasons for
> > > > > preferring
> > > > > > > > > > setting a
> > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > > > > > if there are anything else I
> > overlooked.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > > > > > > > > If there is any conflict between
> > multiple
> > > > > > > > > configuration
> > > > > > > > > > > > that
> > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > > > explicitly specified, I think we
> should
> > > > throw
> > > > > > an
> > > > > > > > > error.
> > > > > > > > > > > > > > > > > > > > I think doing checking on the client
> > side
> > > > is
> > > > > a
> > > > > > > good
> > > > > > > > > > idea,
> > > > > > > > > > > > so
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > > > > > > > > K8s we can discover the problem
> before
> > > > > > submitting
> > > > > > > > the
> > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > cluster,
> > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > > > > > > > > But we can not only rely on the
> client
> > > side
> > > > > > > > checking,
> > > > > > > > > > > > because
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > standalone cluster TaskManagers on
> > > > different
> > > > > > > > machines
> > > > > > > > > > may
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > configurations and the client does
> see
> > > > that.
> > > > > > > > > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang
> > Wang
> > > <
> > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for your detailed proposal.
> > > After
> > > > > all
> > > > > > > the
> > > > > > > > > > memory
> > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > introduced, it will be more
> powerful
> > to
> > > > > > control
> > > > > > > > the
> > > > > > > > > > > flink
> > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > We do not differentiate user direct
> > > > memory
> > > > > > and
> > > > > > > > > native
> > > > > > > > > > > > > memory.
> > > > > > > > > > > > > > > > They
> > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > included in task off-heap memory.
> > > Right?
> > > > > So i
> > > > > > > > don’t
> > > > > > > > > > > think
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize
> > properly. I
> > > > > > prefer
> > > > > > > > > > leaving
> > > > > > > > > > > > it a
> > > > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > If the sum of and fine-grained
> > > > > memory(network
> > > > > > > > > memory,
> > > > > > > > > > > > > managed
> > > > > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > > > > > > > is larger than total process
> memory,
> > > how
> > > > do
> > > > > > we
> > > > > > > > deal
> > > > > > > > > > > with
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > situation?
> > > > > > > > > > > > > > > > > > > > Do
> > > > > > > > > > > > > > > > > > > > > we need to check the memory
> > > configuration
> > > > > in
> > > > > > > > > client?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song <
> [email protected]>
> > > > > > > > 于2019年8月7日周三
> > > > > > > > > > > > > 下午10:14写道：
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > We would like to start a
> discussion
> > > > > thread
> > > > > > on
> > > > > > > > > > > "FLIP-49:
> > > > > > > > > > > > > > > Unified
> > > > > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > > > > > > > > Configuration for
> > TaskExecutors"[1],
> > > > > where
> > > > > > we
> > > > > > > > > > > describe
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> configurations.
> > > The
> > > > > > FLIP
> > > > > > > > > > document
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > based
> > > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > early design "Memory Management
> and
> > > > > > > > Configuration
> > > > > > > > > > > > > > > Reloaded"[2]
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > > > > > > > > with updates from follow-up
> > > discussions
> > > > > > both
> > > > > > > > > online
> > > > > > > > > > > and
> > > > > > > > > > > > > > > > offline.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > This FLIP addresses several
> > > > shortcomings
> > > > > of
> > > > > > > > > current
> > > > > > > > > > > > > (Flink
> > > > > > > > > > > > > > > 1.9)
> > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> configuration.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Different configuration for
> > > > > Streaming
> > > > > > > and
> > > > > > > > > > Batch.
> > > > > > > > > > > > > > > > > > > > > >    - Complex and difficult
> > > > configuration
> > > > > of
> > > > > > > > > RocksDB
> > > > > > > > > > > in
> > > > > > > > > > > > > > > > Streaming.
> > > > > > > > > > > > > > > > > > > > > >    - Complicated, uncertain and
> > hard
> > > to
> > > > > > > > > understand.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Key changes to solve the problems
> > can
> > > > be
> > > > > > > > > summarized
> > > > > > > > > > > as
> > > > > > > > > > > > > > > follows.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Extend memory manager to
> also
> > > > > account
> > > > > > > for
> > > > > > > > > > memory
> > > > > > > > > > > > > usage
> > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > > > > > > > > >    - Modify how TaskExecutor
> memory
> > > is
> > > > > > > > > partitioned
> > > > > > > > > > > > > > accounted
> > > > > > > > > > > > > > > > > > > individual
> > > > > > > > > > > > > > > > > > > > > >    memory reservations and pools.
> > > > > > > > > > > > > > > > > > > > > >    - Simplify memory
> configuration
> > > > > options
> > > > > > > and
> > > > > > > > > > > > > calculations
> > > > > > > > > > > > > > > > > logics.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Please find more details in the
> > FLIP
> > > > wiki
> > > > > > > > > document
> > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > (Please note that the early
> design
> > > doc
> > > > > [2]
> > > > > > is
> > > > > > > > out
> > > > > > > > > > of
> > > > > > > > > > > > > sync,
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > appreciated to have the
> discussion
> > in
> > > > > this
> > > > > > > > > mailing
> > > > > > > > > > > list
> > > > > > > > > > > > > > > > thread.)
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Looking forward to your
> feedbacks.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song <
> > > > > > > > [email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for sharing your opinion Till.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm also in favor of alternative 2. I was wondering
> > > > whether
> > > > > > we
> > > > > > > > can
> > > > > > > > > > > avoid
> > > > > > > > > > > > > using Unsafe.allocate() for off-heap managed memory
> > and
> > > > > > network
> > > > > > > > > > memory
> > > > > > > > > > > > with
> > > > > > > > > > > > > alternative 3. But after giving it a second
> thought,
> > I
> > > > > think
> > > > > > > even
> > > > > > > > > for
> > > > > > > > > > > > > alternative 3 using direct memory for off-heap
> > managed
> > > > > memory
> > > > > > > > could
> > > > > > > > > > > cause
> > > > > > > > > > > > > problems.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Yang,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding your concern, I think what proposed in
> this
> > > > FLIP
> > > > > it
> > > > > > > to
> > > > > > > > > have
> > > > > > > > > > > > both
> > > > > > > > > > > > > off-heap managed memory and network memory
> allocated
> > > > > through
> > > > > > > > > > > > > Unsafe.allocate(), which means they are practically
> > > > native
> > > > > > > memory
> > > > > > > > > and
> > > > > > > > > > > not
> > > > > > > > > > > > > limited by JVM max direct memory. The only parts of
> > > > memory
> > > > > > > > limited
> > > > > > > > > by
> > > > > > > > > > > JVM
> > > > > > > > > > > > > max direct memory are task off-heap memory and JVM
> > > > > overhead,
> > > > > > > > which
> > > > > > > > > > are
> > > > > > > > > > > > > exactly alternative 2 suggests to set the JVM max
> > > direct
> > > > > > memory
> > > > > > > > to.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till Rohrmann <
> > > > > > > > > [email protected]>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the clarification Xintong. I
> understand
> > > the
> > > > > two
> > > > > > > > > > > alternatives
> > > > > > > > > > > > > > now.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would be in favour of option 2 because it makes
> > > > things
> > > > > > > > > explicit.
> > > > > > > > > > If
> > > > > > > > > > > > we
> > > > > > > > > > > > > > don't limit the direct memory, I fear that we
> might
> > > end
> > > > > up
> > > > > > > in a
> > > > > > > > > > > similar
> > > > > > > > > > > > > > situation as we are currently in: The user might
> > see
> > > > that
> > > > > > her
> > > > > > > > > > process
> > > > > > > > > > > > > gets
> > > > > > > > > > > > > > killed by the OS and does not know why this is
> the
> > > > case.
> > > > > > > > > > > Consequently,
> > > > > > > > > > > > > she
> > > > > > > > > > > > > > tries to decrease the process memory size
> (similar
> > to
> > > > > > > > increasing
> > > > > > > > > > the
> > > > > > > > > > > > > cutoff
> > > > > > > > > > > > > > ratio) in order to accommodate for the extra
> direct
> > > > > memory.
> > > > > > > > Even
> > > > > > > > > > > worse,
> > > > > > > > > > > > > she
> > > > > > > > > > > > > > tries to decrease memory budgets which are not
> > fully
> > > > used
> > > > > > and
> > > > > > > > > hence
> > > > > > > > > > > > won't
> > > > > > > > > > > > > > change the overall memory consumption.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong Song <
> > > > > > > > > > [email protected]
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let me explain this with a concrete example
> Till.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let's say we have the following scenario.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Total Process Memory: 1GB
> > > > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap Memory + JVM
> > > > > Overhead):
> > > > > > > > 200MB
> > > > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM Metaspace,
> > > > Off-Heap
> > > > > > > > Managed
> > > > > > > > > > > Memory
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > Network Memory): 800MB
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For alternative 2, we set
> -XX:MaxDirectMemorySize
> > > to
> > > > > > 200MB.
> > > > > > > > > > > > > > > For alternative 3, we set
> -XX:MaxDirectMemorySize
> > > to
> > > > a
> > > > > > very
> > > > > > > > > large
> > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > let's say 1TB.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > Off-Heap
> > > > > Memory
> > > > > > > and
> > > > > > > > > JVM
> > > > > > > > > > > > > > Overhead
> > > > > > > > > > > > > > > do not exceed 200MB, then alternative 2 and
> > > > > alternative 3
> > > > > > > > > should
> > > > > > > > > > > have
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > same utility. Setting larger
> > > -XX:MaxDirectMemorySize
> > > > > will
> > > > > > > not
> > > > > > > > > > > reduce
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > sizes of the other memory pools.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If the actual direct memory usage of Task
> > Off-Heap
> > > > > Memory
> > > > > > > and
> > > > > > > > > JVM
> > > > > > > > > > > > > > > Overhead potentially exceed 200MB, then
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Alternative 2 suffers from frequent OOM.
> To
> > > > avoid
> > > > > > > that,
> > > > > > > > > the
> > > > > > > > > > > only
> > > > > > > > > > > > > > thing
> > > > > > > > > > > > > > >    user can do is to modify the configuration
> and
> > > > > > increase
> > > > > > > > JVM
> > > > > > > > > > > Direct
> > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > >    (Task Off-Heap Memory + JVM Overhead). Let's
> > say
> > > > > that
> > > > > > > user
> > > > > > > > > > > > increases
> > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > >    Direct Memory to 250MB, this will reduce the
> > > total
> > > > > > size
> > > > > > > of
> > > > > > > > > > other
> > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > >    pools to 750MB, given the total process
> memory
> > > > > remains
> > > > > > > > 1GB.
> > > > > > > > > > > > > > >    - For alternative 3, there is no chance of
> > > direct
> > > > > OOM.
> > > > > > > > There
> > > > > > > > > > are
> > > > > > > > > > > > > > chances
> > > > > > > > > > > > > > >    of exceeding the total process memory limit,
> > but
> > > > > given
> > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > >    not use up all the reserved native memory
> > > > (Off-Heap
> > > > > > > > Managed
> > > > > > > > > > > > Memory,
> > > > > > > > > > > > > > > Network
> > > > > > > > > > > > > > >    Memory, JVM Metaspace), if the actual direct
> > > > memory
> > > > > > > usage
> > > > > > > > is
> > > > > > > > > > > > > slightly
> > > > > > > > > > > > > > > above
> > > > > > > > > > > > > > >    yet very close to 200MB, user probably do
> not
> > > need
> > > > > to
> > > > > > > > change
> > > > > > > > > > the
> > > > > > > > > > > > > > >    configurations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Therefore, I think from the user's
> perspective, a
> > > > > > feasible
> > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > for alternative 2 may lead to lower resource
> > > > > utilization
> > > > > > > > > compared
> > > > > > > > > > > to
> > > > > > > > > > > > > > > alternative 3.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till Rohrmann
> <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I guess you have to help me understand the
> > > > difference
> > > > > > > > between
> > > > > > > > > > > > > > > alternative 2
> > > > > > > > > > > > > > > > and 3 wrt to memory under utilization
> Xintong.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Alternative 2: set XX:MaxDirectMemorySize
> to
> > > Task
> > > > > > > > Off-Heap
> > > > > > > > > > > Memory
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > JVM
> > > > > > > > > > > > > > > > Overhead. Then there is the risk that this
> size
> > > is
> > > > > too
> > > > > > > low
> > > > > > > > > > > > resulting
> > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > lot of garbage collection and potentially an
> > OOM.
> > > > > > > > > > > > > > > > - Alternative 3: set XX:MaxDirectMemorySize
> to
> > > > > > something
> > > > > > > > > larger
> > > > > > > > > > > > than
> > > > > > > > > > > > > > > > alternative 2. This would of course reduce
> the
> > > > sizes
> > > > > of
> > > > > > > the
> > > > > > > > > > other
> > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > types.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > How would alternative 2 now result in an
> under
> > > > > > > utilization
> > > > > > > > of
> > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > compared to alternative 3? If alternative 3
> > > > strictly
> > > > > > > sets a
> > > > > > > > > > > higher
> > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > direct memory size and we use only little,
> > then I
> > > > > would
> > > > > > > > > expect
> > > > > > > > > > > that
> > > > > > > > > > > > > > > > alternative 3 results in memory under
> > > utilization.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <
> > > > > > > > > > [email protected]
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi xintong,till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Native and Direct Memory
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > My point is setting a very large max direct
> > > > memory
> > > > > > size
> > > > > > > > > when
> > > > > > > > > > we
> > > > > > > > > > > > do
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > differentiate direct and native memory. If
> > the
> > > > > direct
> > > > > > > > > > > > > > memory,including
> > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > direct memory and framework direct
> > memory,could
> > > > be
> > > > > > > > > calculated
> > > > > > > > > > > > > > > > > correctly,then
> > > > > > > > > > > > > > > > > i am in favor of setting direct memory with
> > > fixed
> > > > > > > value.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Memory Calculation
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I agree with xintong. For Yarn and k8s,we
> > need
> > > to
> > > > > > check
> > > > > > > > the
> > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > configurations in client to avoid
> submitting
> > > > > > > successfully
> > > > > > > > > and
> > > > > > > > > > > > > failing
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > the flink master.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yang
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song <[email protected]
> > > >于2019年8月13日
> > > > > > > > 周二22:07写道：
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for replying, Till.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > About MemorySegment, I think you are
> right
> > > that
> > > > > we
> > > > > > > > should
> > > > > > > > > > not
> > > > > > > > > > > > > > include
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > issue in the scope of this FLIP. This
> FLIP
> > > > should
> > > > > > > > > > concentrate
> > > > > > > > > > > > on
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > configure memory pools for TaskExecutors,
> > > with
> > > > > > > minimum
> > > > > > > > > > > > > involvement
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > memory consumers use it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > About direct memory, I think alternative
> 3
> > > may
> > > > > not
> > > > > > > > having
> > > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > reservation issue that alternative 2
> does,
> > > but
> > > > at
> > > > > > the
> > > > > > > > > cost
> > > > > > > > > > of
> > > > > > > > > > > > > risk
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > using memory at the container level,
> which
> > is
> > > > not
> > > > > > > good.
> > > > > > > > > My
> > > > > > > > > > > > point
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and "JVM
> > > Overhead"
> > > > > are
> > > > > > > not
> > > > > > > > > easy
> > > > > > > > > > > to
> > > > > > > > > > > > > > > config.
> > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > alternative 2, users might configure them
> > > > higher
> > > > > > than
> > > > > > > > > what
> > > > > > > > > > > > > actually
> > > > > > > > > > > > > > > > > needed,
> > > > > > > > > > > > > > > > > > just to avoid getting a direct OOM. For
> > > > > alternative
> > > > > > > 3,
> > > > > > > > > > users
> > > > > > > > > > > do
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > > > direct OOM, so they may not config the
> two
> > > > > options
> > > > > > > > > > > aggressively
> > > > > > > > > > > > > > high.
> > > > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > > > the consequences are risks of overall
> > > container
> > > > > > > memory
> > > > > > > > > > usage
> > > > > > > > > > > > > > exceeds
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > budget.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till
> > > Rohrmann <
> > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for proposing this FLIP Xintong.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > All in all I think it already looks
> quite
> > > > good.
> > > > > > > > > > Concerning
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > open
> > > > > > > > > > > > > > > > > > > question about allocating memory
> > segments,
> > > I
> > > > > was
> > > > > > > > > > wondering
> > > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > strictly necessary to do in the context
> > of
> > > > this
> > > > > > > FLIP
> > > > > > > > or
> > > > > > > > > > > > whether
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > be done as a follow up? Without knowing
> > all
> > > > > > > details,
> > > > > > > > I
> > > > > > > > > > > would
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > concerned
> > > > > > > > > > > > > > > > > > > that we would widen the scope of this
> > FLIP
> > > > too
> > > > > > much
> > > > > > > > > > because
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > to touch all the existing call sites of
> > the
> > > > > > > > > MemoryManager
> > > > > > > > > > > > where
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > allocate
> > > > > > > > > > > > > > > > > > > memory segments (this should mainly be
> > > batch
> > > > > > > > > operators).
> > > > > > > > > > > The
> > > > > > > > > > > > > > > addition
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > the memory reservation call to the
> > > > > MemoryManager
> > > > > > > > should
> > > > > > > > > > not
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > affected
> > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > this and I would hope that this is the
> > only
> > > > > point
> > > > > > > of
> > > > > > > > > > > > > interaction
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > streaming job would have with the
> > > > > MemoryManager.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Concerning the second open question
> about
> > > > > setting
> > > > > > > or
> > > > > > > > > not
> > > > > > > > > > > > > setting
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > direct memory limit, I would also be
> > > > interested
> > > > > > why
> > > > > > > > > Yang
> > > > > > > > > > > Wang
> > > > > > > > > > > > > > > thinks
> > > > > > > > > > > > > > > > > > > leaving it open would be best. My
> concern
> > > > about
> > > > > > > this
> > > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > be in a similar situation as we are now
> > > with
> > > > > the
> > > > > > > > > > > > > > > RocksDBStateBackend.
> > > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > > the different memory pools are not
> > clearly
> > > > > > > separated
> > > > > > > > > and
> > > > > > > > > > > can
> > > > > > > > > > > > > > spill
> > > > > > > > > > > > > > > > over
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > a different pool, then it is quite hard
> > to
> > > > > > > understand
> > > > > > > > > > what
> > > > > > > > > > > > > > exactly
> > > > > > > > > > > > > > > > > > causes a
> > > > > > > > > > > > > > > > > > > process to get killed for using too
> much
> > > > > memory.
> > > > > > > This
> > > > > > > > > > could
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > easily
> > > > > > > > > > > > > > > > > > > lead to a similar situation what we
> have
> > > with
> > > > > the
> > > > > > > > > > > > cutoff-ratio.
> > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > setting a sane default value for max
> > direct
> > > > > > memory
> > > > > > > > and
> > > > > > > > > > > giving
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > option to increase it if he runs into
> an
> > > OOM.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Xintong, how would alternative 2 lead
> to
> > > > lower
> > > > > > > > memory
> > > > > > > > > > > > > > utilization
> > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > alternative 3 where we set the direct
> > > memory
> > > > > to a
> > > > > > > > > higher
> > > > > > > > > > > > value?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong
> > > Song <
> > > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the feedback, Yang.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding your comments:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > *Native and Direct Memory*
> > > > > > > > > > > > > > > > > > > > I think setting a very large max
> direct
> > > > > memory
> > > > > > > size
> > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > > good sides. E.g., we do not worry
> about
> > > > > direct
> > > > > > > OOM,
> > > > > > > > > and
> > > > > > > > > > > we
> > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > to allocate managed / network memory
> > with
> > > > > > > > > > > > Unsafe.allocate() .
> > > > > > > > > > > > > > > > > > > > However, there are also some down
> sides
> > > of
> > > > > > doing
> > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - One thing I can think of is that
> > if
> > > a
> > > > > task
> > > > > > > > > > executor
> > > > > > > > > > > > > > > container
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > >    killed due to overusing memory, it
> > > could
> > > > > be
> > > > > > > hard
> > > > > > > > > for
> > > > > > > > > > > use
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > part
> > > > > > > > > > > > > > > > > > > >    of the memory is overused.
> > > > > > > > > > > > > > > > > > > >    - Another down side is that the
> JVM
> > > > never
> > > > > > > > trigger
> > > > > > > > > GC
> > > > > > > > > > > due
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > reaching
> > > > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > >    direct memory limit, because the
> > limit
> > > > is
> > > > > > too
> > > > > > > > high
> > > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > reached.
> > > > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > > > >    means we kind of relay on heap
> > memory
> > > to
> > > > > > > trigger
> > > > > > > > > GC
> > > > > > > > > > > and
> > > > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > >    memory. That could be a problem in
> > > cases
> > > > > > where
> > > > > > > > we
> > > > > > > > > > have
> > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > >    usage but not enough heap activity
> > to
> > > > > > trigger
> > > > > > > > the
> > > > > > > > > > GC.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Maybe you can share your reasons for
> > > > > preferring
> > > > > > > > > > setting a
> > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > value,
> > > > > > > > > > > > > > > > > > > > if there are anything else I
> > overlooked.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > *Memory Calculation*
> > > > > > > > > > > > > > > > > > > > If there is any conflict between
> > multiple
> > > > > > > > > configuration
> > > > > > > > > > > > that
> > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > > > explicitly specified, I think we
> should
> > > > throw
> > > > > > an
> > > > > > > > > error.
> > > > > > > > > > > > > > > > > > > > I think doing checking on the client
> > side
> > > > is
> > > > > a
> > > > > > > good
> > > > > > > > > > idea,
> > > > > > > > > > > > so
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > Yarn /
> > > > > > > > > > > > > > > > > > > > K8s we can discover the problem
> before
> > > > > > submitting
> > > > > > > > the
> > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > cluster,
> > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > is always a good thing.
> > > > > > > > > > > > > > > > > > > > But we can not only rely on the
> client
> > > side
> > > > > > > > checking,
> > > > > > > > > > > > because
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > standalone cluster TaskManagers on
> > > > different
> > > > > > > > machines
> > > > > > > > > > may
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > configurations and the client does
> see
> > > > that.
> > > > > > > > > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang
> > Wang
> > > <
> > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi xintong,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for your detailed proposal.
> > > After
> > > > > all
> > > > > > > the
> > > > > > > > > > memory
> > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > introduced, it will be more
> powerful
> > to
> > > > > > control
> > > > > > > > the
> > > > > > > > > > > flink
> > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > > usage. I
> > > > > > > > > > > > > > > > > > > > > just have few questions about it.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >    - Native and Direct Memory
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > We do not differentiate user direct
> > > > memory
> > > > > > and
> > > > > > > > > native
> > > > > > > > > > > > > memory.
> > > > > > > > > > > > > > > > They
> > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > included in task off-heap memory.
> > > Right?
> > > > > So i
> > > > > > > > don’t
> > > > > > > > > > > think
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize
> > properly. I
> > > > > > prefer
> > > > > > > > > > leaving
> > > > > > > > > > > > it a
> > > > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >    - Memory Calculation
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > If the sum of and fine-grained
> > > > > memory(network
> > > > > > > > > memory,
> > > > > > > > > > > > > managed
> > > > > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > > > > > > > is larger than total process
> memory,
> > > how
> > > > do
> > > > > > we
> > > > > > > > deal
> > > > > > > > > > > with
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > situation?
> > > > > > > > > > > > > > > > > > > > Do
> > > > > > > > > > > > > > > > > > > > > we need to check the memory
> > > configuration
> > > > > in
> > > > > > > > > client?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song <
> [email protected]>
> > > > > > > > 于2019年8月7日周三
> > > > > > > > > > > > > 下午10:14写道：
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > We would like to start a
> discussion
> > > > > thread
> > > > > > on
> > > > > > > > > > > "FLIP-49:
> > > > > > > > > > > > > > > Unified
> > > > > > > > > > > > > > > > > > > Memory
> > > > > > > > > > > > > > > > > > > > > > Configuration for
> > TaskExecutors"[1],
> > > > > where
> > > > > > we
> > > > > > > > > > > describe
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> configurations.
> > > The
> > > > > > FLIP
> > > > > > > > > > document
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > based
> > > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > early design "Memory Management
> and
> > > > > > > > Configuration
> > > > > > > > > > > > > > > Reloaded"[2]
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > > Stephan,
> > > > > > > > > > > > > > > > > > > > > > with updates from follow-up
> > > discussions
> > > > > > both
> > > > > > > > > online
> > > > > > > > > > > and
> > > > > > > > > > > > > > > > offline.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > This FLIP addresses several
> > > > shortcomings
> > > > > of
> > > > > > > > > current
> > > > > > > > > > > > > (Flink
> > > > > > > > > > > > > > > 1.9)
> > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory
> configuration.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Different configuration for
> > > > > Streaming
> > > > > > > and
> > > > > > > > > > Batch.
> > > > > > > > > > > > > > > > > > > > > >    - Complex and difficult
> > > > configuration
> > > > > of
> > > > > > > > > RocksDB
> > > > > > > > > > > in
> > > > > > > > > > > > > > > > Streaming.
> > > > > > > > > > > > > > > > > > > > > >    - Complicated, uncertain and
> > hard
> > > to
> > > > > > > > > understand.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Key changes to solve the problems
> > can
> > > > be
> > > > > > > > > summarized
> > > > > > > > > > > as
> > > > > > > > > > > > > > > follows.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >    - Extend memory manager to
> also
> > > > > account
> > > > > > > for
> > > > > > > > > > memory
> > > > > > > > > > > > > usage
> > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > > > > > >    backends.
> > > > > > > > > > > > > > > > > > > > > >    - Modify how TaskExecutor
> memory
> > > is
> > > > > > > > > partitioned
> > > > > > > > > > > > > > accounted
> > > > > > > > > > > > > > > > > > > individual
> > > > > > > > > > > > > > > > > > > > > >    memory reservations and pools.
> > > > > > > > > > > > > > > > > > > > > >    - Simplify memory
> configuration
> > > > > options
> > > > > > > and
> > > > > > > > > > > > > calculations
> > > > > > > > > > > > > > > > > logics.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Please find more details in the
> > FLIP
> > > > wiki
> > > > > > > > > document
> > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > (Please note that the early
> design
> > > doc
> > > > > [2]
> > > > > > is
> > > > > > > > out
> > > > > > > > > > of
> > > > > > > > > > > > > sync,
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > appreciated to have the
> discussion
> > in
> > > > > this
> > > > > > > > > mailing
> > > > > > > > > > > list
> > > > > > > > > > > > > > > > thread.)
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Looking forward to your
> feedbacks.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Reply via email to