I guess you have to help me understand the difference between alternative 2 and 3 wrt to memory under utilization Xintong.
- Alternative 2: set XX:MaxDirectMemorySize to Task Off-Heap Memory and JVM Overhead. Then there is the risk that this size is too low resulting in a lot of garbage collection and potentially an OOM. - Alternative 3: set XX:MaxDirectMemorySize to something larger than alternative 2. This would of course reduce the sizes of the other memory types. How would alternative 2 now result in an under utilization of memory compared to alternative 3? If alternative 3 strictly sets a higher max direct memory size and we use only little, then I would expect that alternative 3 results in memory under utilization. Cheers, Till On Tue, Aug 13, 2019 at 4:19 PM Yang Wang <danrtsey...@gmail.com> wrote: > Hi xintong,till > > > > Native and Direct Memory > > My point is setting a very large max direct memory size when we do not > differentiate direct and native memory. If the direct memory,including user > direct memory and framework direct memory,could be calculated > correctly,then > i am in favor of setting direct memory with fixed value. > > > > > Memory Calculation > > I agree with xintong. For Yarn and k8s,we need to check the memory > configurations in client to avoid submitting successfully and failing in > the flink master. > > > Best, > > Yang > > Xintong Song <tonysong...@gmail.com>于2019年8月13日 周二22:07写道: > > > Thanks for replying, Till. > > > > About MemorySegment, I think you are right that we should not include > this > > issue in the scope of this FLIP. This FLIP should concentrate on how to > > configure memory pools for TaskExecutors, with minimum involvement on how > > memory consumers use it. > > > > About direct memory, I think alternative 3 may not having the same over > > reservation issue that alternative 2 does, but at the cost of risk of > over > > using memory at the container level, which is not good. My point is that > > both "Task Off-Heap Memory" and "JVM Overhead" are not easy to config. > For > > alternative 2, users might configure them higher than what actually > needed, > > just to avoid getting a direct OOM. For alternative 3, users do not get > > direct OOM, so they may not config the two options aggressively high. But > > the consequences are risks of overall container memory usage exceeds the > > budget. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM Till Rohrmann <trohrm...@apache.org> > > wrote: > > > > > Thanks for proposing this FLIP Xintong. > > > > > > All in all I think it already looks quite good. Concerning the first > open > > > question about allocating memory segments, I was wondering whether this > > is > > > strictly necessary to do in the context of this FLIP or whether this > > could > > > be done as a follow up? Without knowing all details, I would be > concerned > > > that we would widen the scope of this FLIP too much because we would > have > > > to touch all the existing call sites of the MemoryManager where we > > allocate > > > memory segments (this should mainly be batch operators). The addition > of > > > the memory reservation call to the MemoryManager should not be affected > > by > > > this and I would hope that this is the only point of interaction a > > > streaming job would have with the MemoryManager. > > > > > > Concerning the second open question about setting or not setting a max > > > direct memory limit, I would also be interested why Yang Wang thinks > > > leaving it open would be best. My concern about this would be that we > > would > > > be in a similar situation as we are now with the RocksDBStateBackend. > If > > > the different memory pools are not clearly separated and can spill over > > to > > > a different pool, then it is quite hard to understand what exactly > > causes a > > > process to get killed for using too much memory. This could then easily > > > lead to a similar situation what we have with the cutoff-ratio. So why > > not > > > setting a sane default value for max direct memory and giving the user > an > > > option to increase it if he runs into an OOM. > > > > > > @Xintong, how would alternative 2 lead to lower memory utilization than > > > alternative 3 where we set the direct memory to a higher value? > > > > > > Cheers, > > > Till > > > > > > On Fri, Aug 9, 2019 at 9:12 AM Xintong Song <tonysong...@gmail.com> > > wrote: > > > > > > > Thanks for the feedback, Yang. > > > > > > > > Regarding your comments: > > > > > > > > *Native and Direct Memory* > > > > I think setting a very large max direct memory size definitely has > some > > > > good sides. E.g., we do not worry about direct OOM, and we don't even > > > need > > > > to allocate managed / network memory with Unsafe.allocate() . > > > > However, there are also some down sides of doing this. > > > > > > > > - One thing I can think of is that if a task executor container is > > > > killed due to overusing memory, it could be hard for use to know > > which > > > > part > > > > of the memory is overused. > > > > - Another down side is that the JVM never trigger GC due to > reaching > > > max > > > > direct memory limit, because the limit is too high to be reached. > > That > > > > means we kind of relay on heap memory to trigger GC and release > > direct > > > > memory. That could be a problem in cases where we have more direct > > > > memory > > > > usage but not enough heap activity to trigger the GC. > > > > > > > > Maybe you can share your reasons for preferring setting a very large > > > value, > > > > if there are anything else I overlooked. > > > > > > > > *Memory Calculation* > > > > If there is any conflict between multiple configuration that user > > > > explicitly specified, I think we should throw an error. > > > > I think doing checking on the client side is a good idea, so that on > > > Yarn / > > > > K8s we can discover the problem before submitting the Flink cluster, > > > which > > > > is always a good thing. > > > > But we can not only rely on the client side checking, because for > > > > standalone cluster TaskManagers on different machines may have > > different > > > > configurations and the client does see that. > > > > What do you think? > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 PM Yang Wang <danrtsey...@gmail.com> > > wrote: > > > > > > > > > Hi xintong, > > > > > > > > > > > > > > > Thanks for your detailed proposal. After all the memory > configuration > > > are > > > > > introduced, it will be more powerful to control the flink memory > > > usage. I > > > > > just have few questions about it. > > > > > > > > > > > > > > > > > > > > - Native and Direct Memory > > > > > > > > > > We do not differentiate user direct memory and native memory. They > > are > > > > all > > > > > included in task off-heap memory. Right? So i don’t think we could > > not > > > > set > > > > > the -XX:MaxDirectMemorySize properly. I prefer leaving it a very > > large > > > > > value. > > > > > > > > > > > > > > > > > > > > - Memory Calculation > > > > > > > > > > If the sum of and fine-grained memory(network memory, managed > memory, > > > > etc.) > > > > > is larger than total process memory, how do we deal with this > > > situation? > > > > Do > > > > > we need to check the memory configuration in client? > > > > > > > > > > Xintong Song <tonysong...@gmail.com> 于2019年8月7日周三 下午10:14写道: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > We would like to start a discussion thread on "FLIP-49: Unified > > > Memory > > > > > > Configuration for TaskExecutors"[1], where we describe how to > > improve > > > > > > TaskExecutor memory configurations. The FLIP document is mostly > > based > > > > on > > > > > an > > > > > > early design "Memory Management and Configuration Reloaded"[2] by > > > > > Stephan, > > > > > > with updates from follow-up discussions both online and offline. > > > > > > > > > > > > This FLIP addresses several shortcomings of current (Flink 1.9) > > > > > > TaskExecutor memory configuration. > > > > > > > > > > > > - Different configuration for Streaming and Batch. > > > > > > - Complex and difficult configuration of RocksDB in Streaming. > > > > > > - Complicated, uncertain and hard to understand. > > > > > > > > > > > > > > > > > > Key changes to solve the problems can be summarized as follows. > > > > > > > > > > > > - Extend memory manager to also account for memory usage by > > state > > > > > > backends. > > > > > > - Modify how TaskExecutor memory is partitioned accounted > > > individual > > > > > > memory reservations and pools. > > > > > > - Simplify memory configuration options and calculations > logics. > > > > > > > > > > > > > > > > > > Please find more details in the FLIP wiki document [1]. > > > > > > > > > > > > (Please note that the early design doc [2] is out of sync, and it > > is > > > > > > appreciated to have the discussion in this mailing list thread.) > > > > > > > > > > > > > > > > > > Looking forward to your feedbacks. > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing > > > > > > > > > > > > > > > > > > > > >