Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Till Rohrmann Fri, 16 Aug 2019 01:28:40 -0700

I guess you have to help me understand the difference between alternative 2
and 3 wrt to memory under utilization Xintong.


- Alternative 2: set XX:MaxDirectMemorySize to Task Off-Heap Memory and JVM
Overhead. Then there is the risk that this size is too low resulting in a
lot of garbage collection and potentially an OOM.
- Alternative 3: set XX:MaxDirectMemorySize to something larger than
alternative 2. This would of course reduce the sizes of the other memory
types.

How would alternative 2 now result in an under utilization of memory
compared to alternative 3? If alternative 3 strictly sets a higher max
direct memory size and we use only little, then I would expect that
alternative 3 results in memory under utilization.

Cheers,
Till

On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <danrtsey...@gmail.com> wrote:

> Hi xintong,till
>
>
> > Native and Direct Memory
>
> My point is setting a very large max direct memory size when we do not
> differentiate direct and native memory. If the direct memory,including user
> direct memory and framework direct memory,could be calculated
> correctly,then
> i am in favor of setting direct memory with fixed value.
>
>
>
> > Memory Calculation
>
> I agree with xintong. For Yarn and k8s,we need to check the memory
> configurations in client to avoid submitting successfully and failing in
> the flink master.
>
>
> Best,
>
> Yang
>
> Xintong Song <tonysong...@gmail.com>于2019年8月13日 周二22:07写道：
>
> > Thanks for replying, Till.
> >
> > About MemorySegment, I think you are right that we should not include
> this
> > issue in the scope of this FLIP. This FLIP should concentrate on how to
> > configure memory pools for TaskExecutors, with minimum involvement on how
> > memory consumers use it.
> >
> > About direct memory, I think alternative 3 may not having the same over
> > reservation issue that alternative 2 does, but at the cost of risk of
> over
> > using memory at the container level, which is not good. My point is that
> > both "Task Off-Heap Memory" and "JVM Overhead" are not easy to config.
> For
> > alternative 2, users might configure them higher than what actually
> needed,
> > just to avoid getting a direct OOM. For alternative 3, users do not get
> > direct OOM, so they may not config the two options aggressively high. But
> > the consequences are risks of overall container memory usage exceeds the
> > budget.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Aug 13, 2019 at 9:39 AM Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> > > Thanks for proposing this FLIP Xintong.
> > >
> > > All in all I think it already looks quite good. Concerning the first
> open
> > > question about allocating memory segments, I was wondering whether this
> > is
> > > strictly necessary to do in the context of this FLIP or whether this
> > could
> > > be done as a follow up? Without knowing all details, I would be
> concerned
> > > that we would widen the scope of this FLIP too much because we would
> have
> > > to touch all the existing call sites of the MemoryManager where we
> > allocate
> > > memory segments (this should mainly be batch operators). The addition
> of
> > > the memory reservation call to the MemoryManager should not be affected
> > by
> > > this and I would hope that this is the only point of interaction a
> > > streaming job would have with the MemoryManager.
> > >
> > > Concerning the second open question about setting or not setting a max
> > > direct memory limit, I would also be interested why Yang Wang thinks
> > > leaving it open would be best. My concern about this would be that we
> > would
> > > be in a similar situation as we are now with the RocksDBStateBackend.
> If
> > > the different memory pools are not clearly separated and can spill over
> > to
> > > a different pool, then it is quite hard to understand what exactly
> > causes a
> > > process to get killed for using too much memory. This could then easily
> > > lead to a similar situation what we have with the cutoff-ratio. So why
> > not
> > > setting a sane default value for max direct memory and giving the user
> an
> > > option to increase it if he runs into an OOM.
> > >
> > > @Xintong, how would alternative 2 lead to lower memory utilization than
> > > alternative 3 where we set the direct memory to a higher value?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Aug 9, 2019 at 9:12 AM Xintong Song <tonysong...@gmail.com>
> > wrote:
> > >
> > > > Thanks for the feedback, Yang.
> > > >
> > > > Regarding your comments:
> > > >
> > > > *Native and Direct Memory*
> > > > I think setting a very large max direct memory size definitely has
> some
> > > > good sides. E.g., we do not worry about direct OOM, and we don't even
> > > need
> > > > to allocate managed / network memory with Unsafe.allocate() .
> > > > However, there are also some down sides of doing this.
> > > >
> > > >    - One thing I can think of is that if a task executor container is
> > > >    killed due to overusing memory, it could be hard for use to know
> > which
> > > > part
> > > >    of the memory is overused.
> > > >    - Another down side is that the JVM never trigger GC due to
> reaching
> > > max
> > > >    direct memory limit, because the limit is too high to be reached.
> > That
> > > >    means we kind of relay on heap memory to trigger GC and release
> > direct
> > > >    memory. That could be a problem in cases where we have more direct
> > > > memory
> > > >    usage but not enough heap activity to trigger the GC.
> > > >
> > > > Maybe you can share your reasons for preferring setting a very large
> > > value,
> > > > if there are anything else I overlooked.
> > > >
> > > > *Memory Calculation*
> > > > If there is any conflict between multiple configuration that user
> > > > explicitly specified, I think we should throw an error.
> > > > I think doing checking on the client side is a good idea, so that on
> > > Yarn /
> > > > K8s we can discover the problem before submitting the Flink cluster,
> > > which
> > > > is always a good thing.
> > > > But we can not only rely on the client side checking, because for
> > > > standalone cluster TaskManagers on different machines may have
> > different
> > > > configurations and the client does see that.
> > > > What do you think?
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang <danrtsey...@gmail.com>
> > wrote:
> > > >
> > > > > Hi xintong,
> > > > >
> > > > >
> > > > > Thanks for your detailed proposal. After all the memory
> configuration
> > > are
> > > > > introduced, it will be more powerful to control the flink memory
> > > usage. I
> > > > > just have few questions about it.
> > > > >
> > > > >
> > > > >
> > > > >    - Native and Direct Memory
> > > > >
> > > > > We do not differentiate user direct memory and native memory. They
> > are
> > > > all
> > > > > included in task off-heap memory. Right? So i don’t think we could
> > not
> > > > set
> > > > > the -XX:MaxDirectMemorySize properly. I prefer leaving it a very
> > large
> > > > > value.
> > > > >
> > > > >
> > > > >
> > > > >    - Memory Calculation
> > > > >
> > > > > If the sum of and fine-grained memory(network memory, managed
> memory,
> > > > etc.)
> > > > > is larger than total process memory, how do we deal with this
> > > situation?
> > > > Do
> > > > > we need to check the memory configuration in client?
> > > > >
> > > > > Xintong Song <tonysong...@gmail.com> 于2019年8月7日周三 下午10:14写道：
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > We would like to start a discussion thread on "FLIP-49: Unified
> > > Memory
> > > > > > Configuration for TaskExecutors"[1], where we describe how to
> > improve
> > > > > > TaskExecutor memory configurations. The FLIP document is mostly
> > based
> > > > on
> > > > > an
> > > > > > early design "Memory Management and Configuration Reloaded"[2] by
> > > > > Stephan,
> > > > > > with updates from follow-up discussions both online and offline.
> > > > > >
> > > > > > This FLIP addresses several shortcomings of current (Flink 1.9)
> > > > > > TaskExecutor memory configuration.
> > > > > >
> > > > > >    - Different configuration for Streaming and Batch.
> > > > > >    - Complex and difficult configuration of RocksDB in Streaming.
> > > > > >    - Complicated, uncertain and hard to understand.
> > > > > >
> > > > > >
> > > > > > Key changes to solve the problems can be summarized as follows.
> > > > > >
> > > > > >    - Extend memory manager to also account for memory usage by
> > state
> > > > > >    backends.
> > > > > >    - Modify how TaskExecutor memory is partitioned accounted
> > > individual
> > > > > >    memory reservations and pools.
> > > > > >    - Simplify memory configuration options and calculations
> logics.
> > > > > >
> > > > > >
> > > > > > Please find more details in the FLIP wiki document [1].
> > > > > >
> > > > > > (Please note that the early design doc [2] is out of sync, and it
> > is
> > > > > > appreciated to have the discussion in this mailing list thread.)
> > > > > >
> > > > > >
> > > > > > Looking forward to your feedbacks.
> > > > > >
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > >
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Reply via email to