Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Yang Wang Tue, 13 Aug 2019 07:19:28 -0700

Hi xintong,till


> Native and Direct Memory

My point is setting a very large max direct memory size when we do not
differentiate direct and native memory. If the direct memory,including user
direct memory and framework direct memory,could be calculated correctly,then
i am in favor of setting direct memory with fixed value.



> Memory Calculation

I agree with xintong. For Yarn and k8s,we need to check the memory
configurations in client to avoid submitting successfully and failing in
the flink master.


Best,

Yang

Xintong Song <[email protected]>于2019年8月13日 周二22:07写道：

> Thanks for replying, Till.
>
> About MemorySegment, I think you are right that we should not include this
> issue in the scope of this FLIP. This FLIP should concentrate on how to
> configure memory pools for TaskExecutors, with minimum involvement on how
> memory consumers use it.
>
> About direct memory, I think alternative 3 may not having the same over
> reservation issue that alternative 2 does, but at the cost of risk of over
> using memory at the container level, which is not good. My point is that
> both "Task Off-Heap Memory" and "JVM Overhead" are not easy to config. For
> alternative 2, users might configure them higher than what actually needed,
> just to avoid getting a direct OOM. For alternative 3, users do not get
> direct OOM, so they may not config the two options aggressively high. But
> the consequences are risks of overall container memory usage exceeds the
> budget.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Aug 13, 2019 at 9:39 AM Till Rohrmann <[email protected]>
> wrote:
>
> > Thanks for proposing this FLIP Xintong.
> >
> > All in all I think it already looks quite good. Concerning the first open
> > question about allocating memory segments, I was wondering whether this
> is
> > strictly necessary to do in the context of this FLIP or whether this
> could
> > be done as a follow up? Without knowing all details, I would be concerned
> > that we would widen the scope of this FLIP too much because we would have
> > to touch all the existing call sites of the MemoryManager where we
> allocate
> > memory segments (this should mainly be batch operators). The addition of
> > the memory reservation call to the MemoryManager should not be affected
> by
> > this and I would hope that this is the only point of interaction a
> > streaming job would have with the MemoryManager.
> >
> > Concerning the second open question about setting or not setting a max
> > direct memory limit, I would also be interested why Yang Wang thinks
> > leaving it open would be best. My concern about this would be that we
> would
> > be in a similar situation as we are now with the RocksDBStateBackend. If
> > the different memory pools are not clearly separated and can spill over
> to
> > a different pool, then it is quite hard to understand what exactly
> causes a
> > process to get killed for using too much memory. This could then easily
> > lead to a similar situation what we have with the cutoff-ratio. So why
> not
> > setting a sane default value for max direct memory and giving the user an
> > option to increase it if he runs into an OOM.
> >
> > @Xintong, how would alternative 2 lead to lower memory utilization than
> > alternative 3 where we set the direct memory to a higher value?
> >
> > Cheers,
> > Till
> >
> > On Fri, Aug 9, 2019 at 9:12 AM Xintong Song <[email protected]>
> wrote:
> >
> > > Thanks for the feedback, Yang.
> > >
> > > Regarding your comments:
> > >
> > > *Native and Direct Memory*
> > > I think setting a very large max direct memory size definitely has some
> > > good sides. E.g., we do not worry about direct OOM, and we don't even
> > need
> > > to allocate managed / network memory with Unsafe.allocate() .
> > > However, there are also some down sides of doing this.
> > >
> > >    - One thing I can think of is that if a task executor container is
> > >    killed due to overusing memory, it could be hard for use to know
> which
> > > part
> > >    of the memory is overused.
> > >    - Another down side is that the JVM never trigger GC due to reaching
> > max
> > >    direct memory limit, because the limit is too high to be reached.
> That
> > >    means we kind of relay on heap memory to trigger GC and release
> direct
> > >    memory. That could be a problem in cases where we have more direct
> > > memory
> > >    usage but not enough heap activity to trigger the GC.
> > >
> > > Maybe you can share your reasons for preferring setting a very large
> > value,
> > > if there are anything else I overlooked.
> > >
> > > *Memory Calculation*
> > > If there is any conflict between multiple configuration that user
> > > explicitly specified, I think we should throw an error.
> > > I think doing checking on the client side is a good idea, so that on
> > Yarn /
> > > K8s we can discover the problem before submitting the Flink cluster,
> > which
> > > is always a good thing.
> > > But we can not only rely on the client side checking, because for
> > > standalone cluster TaskManagers on different machines may have
> different
> > > configurations and the client does see that.
> > > What do you think?
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang <[email protected]>
> wrote:
> > >
> > > > Hi xintong,
> > > >
> > > >
> > > > Thanks for your detailed proposal. After all the memory configuration
> > are
> > > > introduced, it will be more powerful to control the flink memory
> > usage. I
> > > > just have few questions about it.
> > > >
> > > >
> > > >
> > > >    - Native and Direct Memory
> > > >
> > > > We do not differentiate user direct memory and native memory. They
> are
> > > all
> > > > included in task off-heap memory. Right? So i don’t think we could
> not
> > > set
> > > > the -XX:MaxDirectMemorySize properly. I prefer leaving it a very
> large
> > > > value.
> > > >
> > > >
> > > >
> > > >    - Memory Calculation
> > > >
> > > > If the sum of and fine-grained memory(network memory, managed memory,
> > > etc.)
> > > > is larger than total process memory, how do we deal with this
> > situation?
> > > Do
> > > > we need to check the memory configuration in client?
> > > >
> > > > Xintong Song <[email protected]> 于2019年8月7日周三 下午10:14写道：
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > We would like to start a discussion thread on "FLIP-49: Unified
> > Memory
> > > > > Configuration for TaskExecutors"[1], where we describe how to
> improve
> > > > > TaskExecutor memory configurations. The FLIP document is mostly
> based
> > > on
> > > > an
> > > > > early design "Memory Management and Configuration Reloaded"[2] by
> > > > Stephan,
> > > > > with updates from follow-up discussions both online and offline.
> > > > >
> > > > > This FLIP addresses several shortcomings of current (Flink 1.9)
> > > > > TaskExecutor memory configuration.
> > > > >
> > > > >    - Different configuration for Streaming and Batch.
> > > > >    - Complex and difficult configuration of RocksDB in Streaming.
> > > > >    - Complicated, uncertain and hard to understand.
> > > > >
> > > > >
> > > > > Key changes to solve the problems can be summarized as follows.
> > > > >
> > > > >    - Extend memory manager to also account for memory usage by
> state
> > > > >    backends.
> > > > >    - Modify how TaskExecutor memory is partitioned accounted
> > individual
> > > > >    memory reservations and pools.
> > > > >    - Simplify memory configuration options and calculations logics.
> > > > >
> > > > >
> > > > > Please find more details in the FLIP wiki document [1].
> > > > >
> > > > > (Please note that the early design doc [2] is out of sync, and it
> is
> > > > > appreciated to have the discussion in this mailing list thread.)
> > > > >
> > > > >
> > > > > Looking forward to your feedbacks.
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > >
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Reply via email to