I just updated the FLIP wiki page [1], with the following changes: - Network memory uses JVM direct memory, and is accounted when setting JVM max direct memory size parameter. - Use dynamic configurations (`-Dkey=value`) to pass calculated memory configs into TaskExecutors, instead of ENV variables. - Remove 'supporting memory reservation' from the scope of this FLIP.
@till @stephan, please take another look see if there are any other concerns. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors On Mon, Sep 2, 2019 at 11:13 AM Xintong Song <tonysong...@gmail.com> wrote: > Sorry for the late response. > > - Regarding the `TaskExecutorSpecifics` naming, let's discuss the detail > in PR. > - Regarding passing parameters into the `TaskExecutor`, +1 for using > dynamic configuration at the moment, given that there are more questions to > be discussed to have a general framework for overwriting configurations > with ENV variables. > - Regarding memory reservation, I double checked with Yu and he will take > care of it. > > Thank you~ > > Xintong Song > > > > On Thu, Aug 29, 2019 at 7:35 PM Till Rohrmann <trohrm...@apache.org> > wrote: > >> What I forgot to add is that we could tackle specifying the configuration >> fully in an incremental way and that the full specification should be the >> desired end state. >> >> On Thu, Aug 29, 2019 at 1:33 PM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >> > I think our goal should be that the configuration is fully specified >> when >> > the process is started. By considering the internal calculation step to >> be >> > rather validate existing values and calculate missing ones, these two >> > proposal shouldn't even conflict (given determinism). >> > >> > Since we don't want to change an existing flink-conf.yaml, specifying >> the >> > full configuration would require to pass in the options differently. >> > >> > One way could be the ENV variables approach. The reason why I'm trying >> to >> > exclude this feature from the FLIP is that I believe it needs a bit more >> > discussion. Just some questions which come to my mind: What would be the >> > exact format (FLINK_KEY_NAME)? Would we support a dot separator which is >> > supported by some systems (FLINK.KEY.NAME)? If we accept the dot >> > separator what would be the order of precedence if there are two ENV >> > variables defined (FLINK_KEY_NAME and FLINK.KEY.NAME)? What is the >> > precedence of env variable vs. dynamic configuration value specified >> via -D? >> > >> > Another approach could be to pass in the dynamic configuration values >> via >> > `-Dkey=value` to the Flink process. For that we don't have to change >> > anything because the functionality already exists. >> > >> > Cheers, >> > Till >> > >> > On Thu, Aug 29, 2019 at 12:50 PM Stephan Ewen <se...@apache.org> wrote: >> > >> >> I see. Under the assumption of strict determinism that should work. >> >> >> >> The original proposal had this point "don't compute inside the TM, >> compute >> >> outside and supply a full config", because that sounded more intuitive. >> >> >> >> On Thu, Aug 29, 2019 at 12:15 PM Till Rohrmann <trohrm...@apache.org> >> >> wrote: >> >> >> >> > My understanding was that before starting the Flink process we call a >> >> > utility which calculates these values. I assume that this utility >> will >> >> do >> >> > the calculation based on a set of configured values (process memory, >> >> flink >> >> > memory, network memory etc.). Assuming that these values don't differ >> >> from >> >> > the values with which the JVM is started, it should be possible to >> >> > recompute them in the Flink process in order to set the values. >> >> > >> >> > >> >> > >> >> > On Thu, Aug 29, 2019 at 11:29 AM Stephan Ewen <se...@apache.org> >> wrote: >> >> > >> >> > > When computing the values in the JVM process after it started, how >> >> would >> >> > > you deal with values like Max Direct Memory, Metaspace size. native >> >> > memory >> >> > > reservation (reduce heap size), etc? All the values that are >> >> parameters >> >> > to >> >> > > the JVM process and that need to be supplied at process startup? >> >> > > >> >> > > On Wed, Aug 28, 2019 at 4:46 PM Till Rohrmann < >> trohrm...@apache.org> >> >> > > wrote: >> >> > > >> >> > > > Thanks for the clarification. I have some more comments: >> >> > > > >> >> > > > - I would actually split the logic to compute the process memory >> >> > > > requirements and storing the values into two things. E.g. one >> could >> >> > name >> >> > > > the former TaskExecutorProcessUtility and the latter >> >> > > > TaskExecutorProcessMemory. But we can discuss this on the PR >> since >> >> it's >> >> > > > just a naming detail. >> >> > > > >> >> > > > - Generally, I'm not opposed to making configuration values >> >> overridable >> >> > > by >> >> > > > ENV variables. I think this is a very good idea and makes the >> >> > > > configurability of Flink processes easier. However, I think that >> >> adding >> >> > > > this functionality should not be part of this FLIP because it >> would >> >> > > simply >> >> > > > widen the scope unnecessarily. >> >> > > > >> >> > > > The reasons why I believe it is unnecessary are the following: >> For >> >> Yarn >> >> > > we >> >> > > > already create write a flink-conf.yaml which could be populated >> with >> >> > the >> >> > > > memory settings. For the other processes it should not make a >> >> > difference >> >> > > > whether the loaded Configuration is populated with the memory >> >> settings >> >> > > from >> >> > > > ENV variables or by using TaskExecutorProcessUtility to compute >> the >> >> > > missing >> >> > > > values from the loaded configuration. If the latter would not be >> >> > possible >> >> > > > (wrong or missing configuration values), then we should not have >> >> been >> >> > > able >> >> > > > to actually start the process in the first place. >> >> > > > >> >> > > > - Concerning the memory reservation: I agree with you that we >> need >> >> the >> >> > > > memory reservation functionality to make streaming jobs work with >> >> > > "managed" >> >> > > > memory. However, w/o this functionality the whole Flip would >> already >> >> > > bring >> >> > > > a good amount of improvements to our users when running batch >> jobs. >> >> > > > Moreover, by keeping the scope smaller we can complete the FLIP >> >> faster. >> >> > > > Hence, I would propose to address the memory reservation >> >> functionality >> >> > > as a >> >> > > > follow up FLIP (which Yu is working on if I'm not mistaken). >> >> > > > >> >> > > > Cheers, >> >> > > > Till >> >> > > > >> >> > > > On Wed, Aug 28, 2019 at 11:43 AM Yang Wang < >> danrtsey...@gmail.com> >> >> > > wrote: >> >> > > > >> >> > > > > Just add my 2 cents. >> >> > > > > >> >> > > > > Using environment variables to override the configuration for >> >> > different >> >> > > > > taskmanagers is better. >> >> > > > > We do not need to generate dedicated flink-conf.yaml for all >> >> > > > taskmanagers. >> >> > > > > A common flink-conf.yam and different environment variables are >> >> > enough. >> >> > > > > By reducing the distributed cached files, it could make >> launching >> >> a >> >> > > > > taskmanager faster. >> >> > > > > >> >> > > > > Stephan gives a good suggestion that we could move the logic >> into >> >> > > > > "GlobalConfiguration.loadConfig()" method. >> >> > > > > Maybe the client could also benefit from this. Different users >> do >> >> not >> >> > > > have >> >> > > > > to export FLINK_CONF_DIR to update few config options. >> >> > > > > >> >> > > > > >> >> > > > > Best, >> >> > > > > Yang >> >> > > > > >> >> > > > > Stephan Ewen <se...@apache.org> 于2019年8月28日周三 上午1:21写道: >> >> > > > > >> >> > > > > > One note on the Environment Variables and Configuration >> >> discussion. >> >> > > > > > >> >> > > > > > My understanding is that passed ENV variables are added to >> the >> >> > > > > > configuration in the "GlobalConfiguration.loadConfig()" >> method >> >> (or >> >> > > > > > similar). >> >> > > > > > For all the code inside Flink, it looks like the data was in >> the >> >> > > config >> >> > > > > to >> >> > > > > > start with, just that the scripts that compute the variables >> can >> >> > pass >> >> > > > the >> >> > > > > > values to the process without actually needing to write a >> file. >> >> > > > > > >> >> > > > > > For example the "GlobalConfiguration.loadConfig()" method >> would >> >> > take >> >> > > > any >> >> > > > > > ENV variable prefixed with "flink" and add it as a config >> key. >> >> > > > > > "flink_taskmanager_memory_size=2g" would become >> >> > > > "taskmanager.memory.size: >> >> > > > > > 2g". >> >> > > > > > >> >> > > > > > >> >> > > > > > On Tue, Aug 27, 2019 at 4:05 PM Xintong Song < >> >> > tonysong...@gmail.com> >> >> > > > > > wrote: >> >> > > > > > >> >> > > > > > > Thanks for the comments, Till. >> >> > > > > > > >> >> > > > > > > I've also seen your comments on the wiki page, but let's >> keep >> >> the >> >> > > > > > > discussion here. >> >> > > > > > > >> >> > > > > > > - Regarding 'TaskExecutorSpecifics', how do you think about >> >> > naming >> >> > > it >> >> > > > > > > 'TaskExecutorResourceSpecifics'. >> >> > > > > > > - Regarding passing memory configurations into task >> executors, >> >> > I'm >> >> > > in >> >> > > > > > favor >> >> > > > > > > of do it via environment variables rather than >> configurations, >> >> > with >> >> > > > the >> >> > > > > > > following two reasons. >> >> > > > > > > - It is easier to keep the memory options once calculate >> >> not to >> >> > > be >> >> > > > > > > changed with environment variables rather than >> configurations. >> >> > > > > > > - I'm not sure whether we should write the configuration >> in >> >> > > startup >> >> > > > > > > scripts. Writing changes into the configuration files when >> >> > running >> >> > > > the >> >> > > > > > > startup scripts does not sounds right to me. Or we could >> make >> >> a >> >> > > copy >> >> > > > of >> >> > > > > > > configuration files per flink cluster, and make the task >> >> executor >> >> > > to >> >> > > > > load >> >> > > > > > > from the copy, and clean up the copy after the cluster is >> >> > shutdown, >> >> > > > > which >> >> > > > > > > is complicated. (I think this is also what Stephan means in >> >> his >> >> > > > comment >> >> > > > > > on >> >> > > > > > > the wiki page?) >> >> > > > > > > - Regarding reserving memory, I think this change should be >> >> > > included >> >> > > > in >> >> > > > > > > this FLIP. I think a big part of motivations of this FLIP >> is >> >> to >> >> > > unify >> >> > > > > > > memory configuration for streaming / batch and make it easy >> >> for >> >> > > > > > configuring >> >> > > > > > > rocksdb memory. If we don't support memory reservation, >> then >> >> > > > streaming >> >> > > > > > jobs >> >> > > > > > > cannot use managed memory (neither on-heap or off-heap), >> which >> >> > > makes >> >> > > > > this >> >> > > > > > > FLIP incomplete. >> >> > > > > > > - Regarding network memory, I think you are right. I think >> we >> >> > > > probably >> >> > > > > > > don't need to change network stack from using direct >> memory to >> >> > > using >> >> > > > > > unsafe >> >> > > > > > > native memory. Network memory size is deterministic, >> cannot be >> >> > > > reserved >> >> > > > > > as >> >> > > > > > > managed memory does, and cannot be overused. I think it >> also >> >> > works >> >> > > if >> >> > > > > we >> >> > > > > > > simply keep using direct memory for network and include it >> in >> >> jvm >> >> > > max >> >> > > > > > > direct memory size. >> >> > > > > > > >> >> > > > > > > Thank you~ >> >> > > > > > > >> >> > > > > > > Xintong Song >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > On Tue, Aug 27, 2019 at 8:12 PM Till Rohrmann < >> >> > > trohrm...@apache.org> >> >> > > > > > > wrote: >> >> > > > > > > >> >> > > > > > > > Hi Xintong, >> >> > > > > > > > >> >> > > > > > > > thanks for addressing the comments and adding a more >> >> detailed >> >> > > > > > > > implementation plan. I have a couple of comments >> concerning >> >> the >> >> > > > > > > > implementation plan: >> >> > > > > > > > >> >> > > > > > > > - The name `TaskExecutorSpecifics` is not really >> >> descriptive. >> >> > > > > Choosing >> >> > > > > > a >> >> > > > > > > > different name could help here. >> >> > > > > > > > - I'm not sure whether I would pass the memory >> >> configuration to >> >> > > the >> >> > > > > > > > TaskExecutor via environment variables. I think it would >> be >> >> > > better >> >> > > > to >> >> > > > > > > write >> >> > > > > > > > it into the configuration one uses to start the TM >> process. >> >> > > > > > > > - If possible, I would exclude the memory reservation >> from >> >> this >> >> > > > FLIP >> >> > > > > > and >> >> > > > > > > > add this as part of a dedicated FLIP. >> >> > > > > > > > - If possible, then I would exclude changes to the >> network >> >> > stack >> >> > > > from >> >> > > > > > > this >> >> > > > > > > > FLIP. Maybe we can simply say that the direct memory >> needed >> >> by >> >> > > the >> >> > > > > > > network >> >> > > > > > > > stack is the framework direct memory requirement. >> Changing >> >> how >> >> > > the >> >> > > > > > memory >> >> > > > > > > > is allocated can happen in a second step. This would keep >> >> the >> >> > > scope >> >> > > > > of >> >> > > > > > > this >> >> > > > > > > > FLIP smaller. >> >> > > > > > > > >> >> > > > > > > > Cheers, >> >> > > > > > > > Till >> >> > > > > > > > >> >> > > > > > > > On Thu, Aug 22, 2019 at 2:51 PM Xintong Song < >> >> > > > tonysong...@gmail.com> >> >> > > > > > > > wrote: >> >> > > > > > > > >> >> > > > > > > > > Hi everyone, >> >> > > > > > > > > >> >> > > > > > > > > I just updated the FLIP document on wiki [1], with the >> >> > > following >> >> > > > > > > changes. >> >> > > > > > > > > >> >> > > > > > > > > - Removed open question regarding MemorySegment >> >> > allocation. >> >> > > As >> >> > > > > > > > > discussed, we exclude this topic from the scope of >> this >> >> > > FLIP. >> >> > > > > > > > > - Updated content about JVM direct memory parameter >> >> > > according >> >> > > > to >> >> > > > > > > > recent >> >> > > > > > > > > discussions, and moved the other options to >> "Rejected >> >> > > > > > Alternatives" >> >> > > > > > > > for >> >> > > > > > > > > the >> >> > > > > > > > > moment. >> >> > > > > > > > > - Added implementation steps. >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > Thank you~ >> >> > > > > > > > > >> >> > > > > > > > > Xintong Song >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > [1] >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors >> >> > > > > > > > > >> >> > > > > > > > > On Mon, Aug 19, 2019 at 7:16 PM Stephan Ewen < >> >> > se...@apache.org >> >> > > > >> >> > > > > > wrote: >> >> > > > > > > > > >> >> > > > > > > > > > @Xintong: Concerning "wait for memory users before >> task >> >> > > dispose >> >> > > > > and >> >> > > > > > > > > memory >> >> > > > > > > > > > release": I agree, that's how it should be. Let's >> try it >> >> > out. >> >> > > > > > > > > > >> >> > > > > > > > > > @Xintong @Jingsong: Concerning " JVM does not wait >> for >> >> GC >> >> > > when >> >> > > > > > > > allocating >> >> > > > > > > > > > direct memory buffer": There seems to be pretty >> >> elaborate >> >> > > logic >> >> > > > > to >> >> > > > > > > free >> >> > > > > > > > > > buffers when allocating new ones. See >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/jdk/file/tip/src/share/classes/java/nio/Bits.java#l643 >> >> > > > > > > > > > >> >> > > > > > > > > > @Till: Maybe. If we assume that the JVM default works >> >> (like >> >> > > > going >> >> > > > > > > with >> >> > > > > > > > > > option 2 and not setting "-XX:MaxDirectMemorySize" at >> >> all), >> >> > > > then >> >> > > > > I >> >> > > > > > > > think >> >> > > > > > > > > it >> >> > > > > > > > > > should be okay to set "-XX:MaxDirectMemorySize" to >> >> > > > > > > > > > "off_heap_managed_memory + direct_memory" even if we >> use >> >> > > > RocksDB. >> >> > > > > > > That >> >> > > > > > > > > is a >> >> > > > > > > > > > big if, though, I honestly have no idea :D Would be >> >> good to >> >> > > > > > > understand >> >> > > > > > > > > > this, though, because this would affect option (2) >> and >> >> > option >> >> > > > > > (1.2). >> >> > > > > > > > > > >> >> > > > > > > > > > On Mon, Aug 19, 2019 at 4:44 PM Xintong Song < >> >> > > > > > tonysong...@gmail.com> >> >> > > > > > > > > > wrote: >> >> > > > > > > > > > >> >> > > > > > > > > > > Thanks for the inputs, Jingsong. >> >> > > > > > > > > > > >> >> > > > > > > > > > > Let me try to summarize your points. Please correct >> >> me if >> >> > > I'm >> >> > > > > > > wrong. >> >> > > > > > > > > > > >> >> > > > > > > > > > > - Memory consumers should always avoid returning >> >> > memory >> >> > > > > > segments >> >> > > > > > > > to >> >> > > > > > > > > > > memory manager while there are still un-cleaned >> >> > > > structures / >> >> > > > > > > > threads >> >> > > > > > > > > > > that >> >> > > > > > > > > > > may use the memory. Otherwise, it would cause >> >> serious >> >> > > > > problems >> >> > > > > > > by >> >> > > > > > > > > > having >> >> > > > > > > > > > > multiple consumers trying to use the same memory >> >> > > segment. >> >> > > > > > > > > > > - JVM does not wait for GC when allocating >> direct >> >> > memory >> >> > > > > > buffer. >> >> > > > > > > > > > > Therefore even we set proper max direct memory >> size >> >> > > limit, >> >> > > > > we >> >> > > > > > > may >> >> > > > > > > > > > still >> >> > > > > > > > > > > encounter direct memory oom if the GC cleaning >> >> memory >> >> > > > slower >> >> > > > > > > than >> >> > > > > > > > > the >> >> > > > > > > > > > > direct memory allocation. >> >> > > > > > > > > > > >> >> > > > > > > > > > > Am I understanding this correctly? >> >> > > > > > > > > > > >> >> > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > >> >> > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > On Mon, Aug 19, 2019 at 4:21 PM JingsongLee < >> >> > > > > > > lzljs3620...@aliyun.com >> >> > > > > > > > > > > .invalid> >> >> > > > > > > > > > > wrote: >> >> > > > > > > > > > > >> >> > > > > > > > > > > > Hi stephan: >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > About option 2: >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > if additional threads not cleanly shut down >> before >> >> we >> >> > can >> >> > > > > exit >> >> > > > > > > the >> >> > > > > > > > > > task: >> >> > > > > > > > > > > > In the current case of memory reuse, it has >> freed up >> >> > the >> >> > > > > memory >> >> > > > > > > it >> >> > > > > > > > > > > > uses. If this memory is used by other tasks and >> >> > > > asynchronous >> >> > > > > > > > threads >> >> > > > > > > > > > > > of exited task may still be writing, there will >> be >> >> > > > > concurrent >> >> > > > > > > > > security >> >> > > > > > > > > > > > problems, and even lead to errors in user >> computing >> >> > > > results. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > So I think this is a serious and intolerable >> bug, No >> >> > > matter >> >> > > > > > what >> >> > > > > > > > the >> >> > > > > > > > > > > > option is, it should be avoided. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > About direct memory cleaned by GC: >> >> > > > > > > > > > > > I don't think it is a good idea, I've >> encountered so >> >> > many >> >> > > > > > > > situations >> >> > > > > > > > > > > > that it's too late for GC to cause DirectMemory >> >> OOM. >> >> > > > Release >> >> > > > > > and >> >> > > > > > > > > > > > allocate DirectMemory depend on the type of user >> >> job, >> >> > > > which >> >> > > > > is >> >> > > > > > > > > > > > often beyond our control. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > Best, >> >> > > > > > > > > > > > Jingsong Lee >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > >> >> > ------------------------------------------------------------------ >> >> > > > > > > > > > > > From:Stephan Ewen <se...@apache.org> >> >> > > > > > > > > > > > Send Time:2019年8月19日(星期一) 15:56 >> >> > > > > > > > > > > > To:dev <dev@flink.apache.org> >> >> > > > > > > > > > > > Subject:Re: [DISCUSS] FLIP-49: Unified Memory >> >> > > Configuration >> >> > > > > for >> >> > > > > > > > > > > > TaskExecutors >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > My main concern with option 2 (manually release >> >> memory) >> >> > > is >> >> > > > > that >> >> > > > > > > > > > segfaults >> >> > > > > > > > > > > > in the JVM send off all sorts of alarms on user >> >> ends. >> >> > So >> >> > > we >> >> > > > > > need >> >> > > > > > > to >> >> > > > > > > > > > > > guarantee that this never happens. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > The trickyness is in tasks that uses data >> >> structures / >> >> > > > > > algorithms >> >> > > > > > > > > with >> >> > > > > > > > > > > > additional threads, like hash table spill/read >> and >> >> > > sorting >> >> > > > > > > threads. >> >> > > > > > > > > We >> >> > > > > > > > > > > need >> >> > > > > > > > > > > > to ensure that these cleanly shut down before we >> can >> >> > exit >> >> > > > the >> >> > > > > > > task. >> >> > > > > > > > > > > > I am not sure that we have that guaranteed >> already, >> >> > > that's >> >> > > > > why >> >> > > > > > > > option >> >> > > > > > > > > > 1.1 >> >> > > > > > > > > > > > seemed simpler to me. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > On Mon, Aug 19, 2019 at 3:42 PM Xintong Song < >> >> > > > > > > > tonysong...@gmail.com> >> >> > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > > Thanks for the comments, Stephan. Summarized in >> >> this >> >> > > way >> >> > > > > > really >> >> > > > > > > > > makes >> >> > > > > > > > > > > > > things easier to understand. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > I'm in favor of option 2, at least for the >> >> moment. I >> >> > > > think >> >> > > > > it >> >> > > > > > > is >> >> > > > > > > > > not >> >> > > > > > > > > > > that >> >> > > > > > > > > > > > > difficult to keep it segfault safe for memory >> >> > manager, >> >> > > as >> >> > > > > > long >> >> > > > > > > as >> >> > > > > > > > > we >> >> > > > > > > > > > > > always >> >> > > > > > > > > > > > > de-allocate the memory segment when it is >> released >> >> > from >> >> > > > the >> >> > > > > > > > memory >> >> > > > > > > > > > > > > consumers. Only if the memory consumer continue >> >> using >> >> > > the >> >> > > > > > > buffer >> >> > > > > > > > of >> >> > > > > > > > > > > > memory >> >> > > > > > > > > > > > > segment after releasing it, in which case we do >> >> want >> >> > > the >> >> > > > > job >> >> > > > > > to >> >> > > > > > > > > fail >> >> > > > > > > > > > so >> >> > > > > > > > > > > > we >> >> > > > > > > > > > > > > detect the memory leak early. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > For option 1.2, I don't think this is a good >> idea. >> >> > Not >> >> > > > only >> >> > > > > > > > because >> >> > > > > > > > > > the >> >> > > > > > > > > > > > > assumption (regular GC is enough to clean >> direct >> >> > > buffers) >> >> > > > > may >> >> > > > > > > not >> >> > > > > > > > > > > always >> >> > > > > > > > > > > > be >> >> > > > > > > > > > > > > true, but also it makes harder for finding >> >> problems >> >> > in >> >> > > > > cases >> >> > > > > > of >> >> > > > > > > > > > memory >> >> > > > > > > > > > > > > overuse. E.g., user configured some direct >> memory >> >> for >> >> > > the >> >> > > > > > user >> >> > > > > > > > > > > libraries. >> >> > > > > > > > > > > > > If the library actually use more direct memory >> >> then >> >> > > > > > configured, >> >> > > > > > > > > which >> >> > > > > > > > > > > > > cannot be cleaned by GC because they are still >> in >> >> > use, >> >> > > > may >> >> > > > > > lead >> >> > > > > > > > to >> >> > > > > > > > > > > > overuse >> >> > > > > > > > > > > > > of the total container memory. In that case, >> if it >> >> > > didn't >> >> > > > > > touch >> >> > > > > > > > the >> >> > > > > > > > > > JVM >> >> > > > > > > > > > > > > default max direct memory limit, we cannot get >> a >> >> > direct >> >> > > > > > memory >> >> > > > > > > > OOM >> >> > > > > > > > > > and >> >> > > > > > > > > > > it >> >> > > > > > > > > > > > > will become super hard to understand which >> part of >> >> > the >> >> > > > > > > > > configuration >> >> > > > > > > > > > > need >> >> > > > > > > > > > > > > to be updated. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > For option 1.1, it has the similar problem as >> >> 1.2, if >> >> > > the >> >> > > > > > > > exceeded >> >> > > > > > > > > > > direct >> >> > > > > > > > > > > > > memory does not reach the max direct memory >> limit >> >> > > > specified >> >> > > > > > by >> >> > > > > > > > the >> >> > > > > > > > > > > > > dedicated parameter. I think it is slightly >> better >> >> > than >> >> > > > > 1.2, >> >> > > > > > > only >> >> > > > > > > > > > > because >> >> > > > > > > > > > > > > we can tune the parameter. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > On Mon, Aug 19, 2019 at 2:53 PM Stephan Ewen < >> >> > > > > > se...@apache.org >> >> > > > > > > > >> >> > > > > > > > > > wrote: >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > > About the "-XX:MaxDirectMemorySize" >> discussion, >> >> > maybe >> >> > > > let >> >> > > > > > me >> >> > > > > > > > > > > summarize >> >> > > > > > > > > > > > > it a >> >> > > > > > > > > > > > > > bit differently: >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > We have the following two options: >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > (1) We let MemorySegments be de-allocated by >> the >> >> > GC. >> >> > > > That >> >> > > > > > > makes >> >> > > > > > > > > it >> >> > > > > > > > > > > > > segfault >> >> > > > > > > > > > > > > > safe. But then we need a way to trigger GC in >> >> case >> >> > > > > > > > de-allocation >> >> > > > > > > > > > and >> >> > > > > > > > > > > > > > re-allocation of a bunch of segments happens >> >> > quickly, >> >> > > > > which >> >> > > > > > > is >> >> > > > > > > > > > often >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > case during batch scheduling or task restart. >> >> > > > > > > > > > > > > > - The "-XX:MaxDirectMemorySize" (option >> 1.1) >> >> is >> >> > one >> >> > > > way >> >> > > > > > to >> >> > > > > > > do >> >> > > > > > > > > > this >> >> > > > > > > > > > > > > > - Another way could be to have a dedicated >> >> > > > bookkeeping >> >> > > > > in >> >> > > > > > > the >> >> > > > > > > > > > > > > > MemoryManager (option 1.2), so that this is a >> >> > number >> >> > > > > > > > independent >> >> > > > > > > > > of >> >> > > > > > > > > > > the >> >> > > > > > > > > > > > > > "-XX:MaxDirectMemorySize" parameter. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > (2) We manually allocate and de-allocate the >> >> memory >> >> > > for >> >> > > > > the >> >> > > > > > > > > > > > > MemorySegments >> >> > > > > > > > > > > > > > (option 2). That way we need not worry about >> >> > > triggering >> >> > > > > GC >> >> > > > > > by >> >> > > > > > > > > some >> >> > > > > > > > > > > > > > threshold or bookkeeping, but it is harder to >> >> > prevent >> >> > > > > > > > segfaults. >> >> > > > > > > > > We >> >> > > > > > > > > > > > need >> >> > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > be very careful about when we release the >> memory >> >> > > > segments >> >> > > > > > > (only >> >> > > > > > > > > in >> >> > > > > > > > > > > the >> >> > > > > > > > > > > > > > cleanup phase of the main thread). >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > If we go with option 1.1, we probably need to >> >> set >> >> > > > > > > > > > > > > > "-XX:MaxDirectMemorySize" to >> >> > > "off_heap_managed_memory + >> >> > > > > > > > > > > direct_memory" >> >> > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > have "direct_memory" as a separate reserved >> >> memory >> >> > > > pool. >> >> > > > > > > > Because >> >> > > > > > > > > if >> >> > > > > > > > > > > we >> >> > > > > > > > > > > > > just >> >> > > > > > > > > > > > > > set "-XX:MaxDirectMemorySize" to >> >> > > > > "off_heap_managed_memory + >> >> > > > > > > > > > > > > jvm_overhead", >> >> > > > > > > > > > > > > > then there will be times when that entire >> >> memory is >> >> > > > > > allocated >> >> > > > > > > > by >> >> > > > > > > > > > > direct >> >> > > > > > > > > > > > > > buffers and we have nothing left for the JVM >> >> > > overhead. >> >> > > > So >> >> > > > > > we >> >> > > > > > > > > either >> >> > > > > > > > > > > > need >> >> > > > > > > > > > > > > a >> >> > > > > > > > > > > > > > way to compensate for that (again some safety >> >> > margin >> >> > > > > cutoff >> >> > > > > > > > > value) >> >> > > > > > > > > > or >> >> > > > > > > > > > > > we >> >> > > > > > > > > > > > > > will exceed container memory. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > If we go with option 1.2, we need to be aware >> >> that >> >> > it >> >> > > > > takes >> >> > > > > > > > > > elaborate >> >> > > > > > > > > > > > > logic >> >> > > > > > > > > > > > > > to push recycling of direct buffers without >> >> always >> >> > > > > > > triggering a >> >> > > > > > > > > > full >> >> > > > > > > > > > > > GC. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > My first guess is that the options will be >> >> easiest >> >> > to >> >> > > > do >> >> > > > > in >> >> > > > > > > the >> >> > > > > > > > > > > > following >> >> > > > > > > > > > > > > > order: >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > - Option 1.1 with a dedicated direct_memory >> >> > > > parameter, >> >> > > > > as >> >> > > > > > > > > > discussed >> >> > > > > > > > > > > > > > above. We would need to find a way to set the >> >> > > > > direct_memory >> >> > > > > > > > > > parameter >> >> > > > > > > > > > > > by >> >> > > > > > > > > > > > > > default. We could start with 64 MB and see >> how >> >> it >> >> > > goes >> >> > > > in >> >> > > > > > > > > practice. >> >> > > > > > > > > > > One >> >> > > > > > > > > > > > > > danger I see is that setting this loo low can >> >> > cause a >> >> > > > > bunch >> >> > > > > > > of >> >> > > > > > > > > > > > additional >> >> > > > > > > > > > > > > > GCs compared to before (we need to watch this >> >> > > > carefully). >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > - Option 2. It is actually quite simple to >> >> > > implement, >> >> > > > > we >> >> > > > > > > > could >> >> > > > > > > > > > try >> >> > > > > > > > > > > > how >> >> > > > > > > > > > > > > > segfault safe we are at the moment. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > - Option 1.2: We would not touch the >> >> > > > > > > > "-XX:MaxDirectMemorySize" >> >> > > > > > > > > > > > > parameter >> >> > > > > > > > > > > > > > at all and assume that all the direct memory >> >> > > > allocations >> >> > > > > > that >> >> > > > > > > > the >> >> > > > > > > > > > JVM >> >> > > > > > > > > > > > and >> >> > > > > > > > > > > > > > Netty do are infrequent enough to be cleaned >> up >> >> > fast >> >> > > > > enough >> >> > > > > > > > > through >> >> > > > > > > > > > > > > regular >> >> > > > > > > > > > > > > > GC. I am not sure if that is a valid >> assumption, >> >> > > > though. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Best, >> >> > > > > > > > > > > > > > Stephan >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song >> < >> >> > > > > > > > > > tonysong...@gmail.com> >> >> > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Thanks for sharing your opinion Till. >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > I'm also in favor of alternative 2. I was >> >> > wondering >> >> > > > > > whether >> >> > > > > > > > we >> >> > > > > > > > > > can >> >> > > > > > > > > > > > > avoid >> >> > > > > > > > > > > > > > > using Unsafe.allocate() for off-heap >> managed >> >> > memory >> >> > > > and >> >> > > > > > > > network >> >> > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > with >> >> > > > > > > > > > > > > > > alternative 3. But after giving it a second >> >> > > thought, >> >> > > > I >> >> > > > > > > think >> >> > > > > > > > > even >> >> > > > > > > > > > > for >> >> > > > > > > > > > > > > > > alternative 3 using direct memory for >> off-heap >> >> > > > managed >> >> > > > > > > memory >> >> > > > > > > > > > could >> >> > > > > > > > > > > > > cause >> >> > > > > > > > > > > > > > > problems. >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Hi Yang, >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Regarding your concern, I think what >> proposed >> >> in >> >> > > this >> >> > > > > > FLIP >> >> > > > > > > it >> >> > > > > > > > > to >> >> > > > > > > > > > > have >> >> > > > > > > > > > > > > > both >> >> > > > > > > > > > > > > > > off-heap managed memory and network memory >> >> > > allocated >> >> > > > > > > through >> >> > > > > > > > > > > > > > > Unsafe.allocate(), which means they are >> >> > practically >> >> > > > > > native >> >> > > > > > > > > memory >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > limited by JVM max direct memory. The only >> >> parts >> >> > of >> >> > > > > > memory >> >> > > > > > > > > > limited >> >> > > > > > > > > > > by >> >> > > > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > max direct memory are task off-heap memory >> and >> >> > JVM >> >> > > > > > > overhead, >> >> > > > > > > > > > which >> >> > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > exactly alternative 2 suggests to set the >> JVM >> >> max >> >> > > > > direct >> >> > > > > > > > memory >> >> > > > > > > > > > to. >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till >> Rohrmann >> >> < >> >> > > > > > > > > > > trohrm...@apache.org> >> >> > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > Thanks for the clarification Xintong. I >> >> > > understand >> >> > > > > the >> >> > > > > > > two >> >> > > > > > > > > > > > > alternatives >> >> > > > > > > > > > > > > > > > now. >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > I would be in favour of option 2 because >> it >> >> > makes >> >> > > > > > things >> >> > > > > > > > > > > explicit. >> >> > > > > > > > > > > > If >> >> > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > don't limit the direct memory, I fear >> that >> >> we >> >> > > might >> >> > > > > end >> >> > > > > > > up >> >> > > > > > > > > in a >> >> > > > > > > > > > > > > similar >> >> > > > > > > > > > > > > > > > situation as we are currently in: The >> user >> >> > might >> >> > > > see >> >> > > > > > that >> >> > > > > > > > her >> >> > > > > > > > > > > > process >> >> > > > > > > > > > > > > > > gets >> >> > > > > > > > > > > > > > > > killed by the OS and does not know why >> this >> >> is >> >> > > the >> >> > > > > > case. >> >> > > > > > > > > > > > > Consequently, >> >> > > > > > > > > > > > > > > she >> >> > > > > > > > > > > > > > > > tries to decrease the process memory size >> >> > > (similar >> >> > > > to >> >> > > > > > > > > > increasing >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > cutoff >> >> > > > > > > > > > > > > > > > ratio) in order to accommodate for the >> extra >> >> > > direct >> >> > > > > > > memory. >> >> > > > > > > > > > Even >> >> > > > > > > > > > > > > worse, >> >> > > > > > > > > > > > > > > she >> >> > > > > > > > > > > > > > > > tries to decrease memory budgets which >> are >> >> not >> >> > > > fully >> >> > > > > > used >> >> > > > > > > > and >> >> > > > > > > > > > > hence >> >> > > > > > > > > > > > > > won't >> >> > > > > > > > > > > > > > > > change the overall memory consumption. >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > Cheers, >> >> > > > > > > > > > > > > > > > Till >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong >> >> Song < >> >> > > > > > > > > > > > tonysong...@gmail.com >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Let me explain this with a concrete >> >> example >> >> > > Till. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Let's say we have the following >> scenario. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Total Process Memory: 1GB >> >> > > > > > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap >> Memory + >> >> JVM >> >> > > > > > > Overhead): >> >> > > > > > > > > > 200MB >> >> > > > > > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM >> >> Metaspace, >> >> > > > > > Off-Heap >> >> > > > > > > > > > Managed >> >> > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > Network Memory): 800MB >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > For alternative 2, we set >> >> > > -XX:MaxDirectMemorySize >> >> > > > > to >> >> > > > > > > > 200MB. >> >> > > > > > > > > > > > > > > > > For alternative 3, we set >> >> > > -XX:MaxDirectMemorySize >> >> > > > > to >> >> > > > > > a >> >> > > > > > > > very >> >> > > > > > > > > > > large >> >> > > > > > > > > > > > > > > value, >> >> > > > > > > > > > > > > > > > > let's say 1TB. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > If the actual direct memory usage of >> Task >> >> > > > Off-Heap >> >> > > > > > > Memory >> >> > > > > > > > > and >> >> > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > Overhead >> >> > > > > > > > > > > > > > > > > do not exceed 200MB, then alternative 2 >> >> and >> >> > > > > > > alternative 3 >> >> > > > > > > > > > > should >> >> > > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > same utility. Setting larger >> >> > > > > -XX:MaxDirectMemorySize >> >> > > > > > > will >> >> > > > > > > > > not >> >> > > > > > > > > > > > > reduce >> >> > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > sizes of the other memory pools. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > If the actual direct memory usage of >> Task >> >> > > > Off-Heap >> >> > > > > > > Memory >> >> > > > > > > > > and >> >> > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > > Overhead potentially exceed 200MB, then >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > - Alternative 2 suffers from >> frequent >> >> OOM. >> >> > > To >> >> > > > > > avoid >> >> > > > > > > > > that, >> >> > > > > > > > > > > the >> >> > > > > > > > > > > > > only >> >> > > > > > > > > > > > > > > > thing >> >> > > > > > > > > > > > > > > > > user can do is to modify the >> >> configuration >> >> > > and >> >> > > > > > > > increase >> >> > > > > > > > > > JVM >> >> > > > > > > > > > > > > Direct >> >> > > > > > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > > > (Task Off-Heap Memory + JVM >> Overhead). >> >> > Let's >> >> > > > say >> >> > > > > > > that >> >> > > > > > > > > user >> >> > > > > > > > > > > > > > increases >> >> > > > > > > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > > Direct Memory to 250MB, this will >> >> reduce >> >> > the >> >> > > > > total >> >> > > > > > > > size >> >> > > > > > > > > of >> >> > > > > > > > > > > > other >> >> > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > pools to 750MB, given the total >> process >> >> > > memory >> >> > > > > > > remains >> >> > > > > > > > > > 1GB. >> >> > > > > > > > > > > > > > > > > - For alternative 3, there is no >> >> chance of >> >> > > > > direct >> >> > > > > > > OOM. >> >> > > > > > > > > > There >> >> > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > > chances >> >> > > > > > > > > > > > > > > > > of exceeding the total process >> memory >> >> > limit, >> >> > > > but >> >> > > > > > > given >> >> > > > > > > > > > that >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > process >> >> > > > > > > > > > > > > > > > > may >> >> > > > > > > > > > > > > > > > > not use up all the reserved native >> >> memory >> >> > > > > > (Off-Heap >> >> > > > > > > > > > Managed >> >> > > > > > > > > > > > > > Memory, >> >> > > > > > > > > > > > > > > > > Network >> >> > > > > > > > > > > > > > > > > Memory, JVM Metaspace), if the >> actual >> >> > direct >> >> > > > > > memory >> >> > > > > > > > > usage >> >> > > > > > > > > > is >> >> > > > > > > > > > > > > > > slightly >> >> > > > > > > > > > > > > > > > > above >> >> > > > > > > > > > > > > > > > > yet very close to 200MB, user >> probably >> >> do >> >> > > not >> >> > > > > need >> >> > > > > > > to >> >> > > > > > > > > > change >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > configurations. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Therefore, I think from the user's >> >> > > perspective, a >> >> > > > > > > > feasible >> >> > > > > > > > > > > > > > > configuration >> >> > > > > > > > > > > > > > > > > for alternative 2 may lead to lower >> >> resource >> >> > > > > > > utilization >> >> > > > > > > > > > > compared >> >> > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > alternative 3. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till >> >> > Rohrmann >> >> > > < >> >> > > > > > > > > > > > > trohrm...@apache.org >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > I guess you have to help me >> understand >> >> the >> >> > > > > > difference >> >> > > > > > > > > > between >> >> > > > > > > > > > > > > > > > > alternative 2 >> >> > > > > > > > > > > > > > > > > > and 3 wrt to memory under utilization >> >> > > Xintong. >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > - Alternative 2: set >> >> XX:MaxDirectMemorySize >> >> > > to >> >> > > > > Task >> >> > > > > > > > > > Off-Heap >> >> > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > > > Overhead. Then there is the risk that >> >> this >> >> > > size >> >> > > > > is >> >> > > > > > > too >> >> > > > > > > > > low >> >> > > > > > > > > > > > > > resulting >> >> > > > > > > > > > > > > > > > in a >> >> > > > > > > > > > > > > > > > > > lot of garbage collection and >> >> potentially >> >> > an >> >> > > > OOM. >> >> > > > > > > > > > > > > > > > > > - Alternative 3: set >> >> XX:MaxDirectMemorySize >> >> > > to >> >> > > > > > > > something >> >> > > > > > > > > > > larger >> >> > > > > > > > > > > > > > than >> >> > > > > > > > > > > > > > > > > > alternative 2. This would of course >> >> reduce >> >> > > the >> >> > > > > > sizes >> >> > > > > > > of >> >> > > > > > > > > the >> >> > > > > > > > > > > > other >> >> > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > types. >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > How would alternative 2 now result >> in an >> >> > > under >> >> > > > > > > > > utilization >> >> > > > > > > > > > of >> >> > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > compared to alternative 3? If >> >> alternative 3 >> >> > > > > > strictly >> >> > > > > > > > > sets a >> >> > > > > > > > > > > > > higher >> >> > > > > > > > > > > > > > > max >> >> > > > > > > > > > > > > > > > > > direct memory size and we use only >> >> little, >> >> > > > then I >> >> > > > > > > would >> >> > > > > > > > > > > expect >> >> > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > > alternative 3 results in memory under >> >> > > > > utilization. >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > Cheers, >> >> > > > > > > > > > > > > > > > > > Till >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang >> >> Wang < >> >> > > > > > > > > > > > danrtsey...@gmail.com >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Hi xintong,till >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Native and Direct Memory >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > My point is setting a very large >> max >> >> > direct >> >> > > > > > memory >> >> > > > > > > > size >> >> > > > > > > > > > > when >> >> > > > > > > > > > > > we >> >> > > > > > > > > > > > > > do >> >> > > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > > > differentiate direct and native >> >> memory. >> >> > If >> >> > > > the >> >> > > > > > > direct >> >> > > > > > > > > > > > > > > > memory,including >> >> > > > > > > > > > > > > > > > > > user >> >> > > > > > > > > > > > > > > > > > > direct memory and framework direct >> >> > > > memory,could >> >> > > > > > be >> >> > > > > > > > > > > calculated >> >> > > > > > > > > > > > > > > > > > > correctly,then >> >> > > > > > > > > > > > > > > > > > > i am in favor of setting direct >> memory >> >> > with >> >> > > > > fixed >> >> > > > > > > > > value. >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Memory Calculation >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > I agree with xintong. For Yarn and >> >> k8s,we >> >> > > > need >> >> > > > > to >> >> > > > > > > > check >> >> > > > > > > > > > the >> >> > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > configurations in client to avoid >> >> > > submitting >> >> > > > > > > > > successfully >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > > > failing >> >> > > > > > > > > > > > > > > > > in >> >> > > > > > > > > > > > > > > > > > > the flink master. >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Best, >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Yang >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Xintong Song < >> tonysong...@gmail.com >> >> > > > > >于2019年8月13日 >> >> > > > > > > > > > 周二22:07写道: >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Thanks for replying, Till. >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > About MemorySegment, I think you >> are >> >> > > right >> >> > > > > that >> >> > > > > > > we >> >> > > > > > > > > > should >> >> > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > include >> >> > > > > > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > issue in the scope of this FLIP. >> >> This >> >> > > FLIP >> >> > > > > > should >> >> > > > > > > > > > > > concentrate >> >> > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > how >> >> > > > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > > configure memory pools for >> >> > TaskExecutors, >> >> > > > > with >> >> > > > > > > > > minimum >> >> > > > > > > > > > > > > > > involvement >> >> > > > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > > > how >> >> > > > > > > > > > > > > > > > > > > > memory consumers use it. >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > About direct memory, I think >> >> > alternative >> >> > > 3 >> >> > > > > may >> >> > > > > > > not >> >> > > > > > > > > > having >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > same >> >> > > > > > > > > > > > > > > > > over >> >> > > > > > > > > > > > > > > > > > > > reservation issue that >> alternative 2 >> >> > > does, >> >> > > > > but >> >> > > > > > at >> >> > > > > > > > the >> >> > > > > > > > > > > cost >> >> > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > risk >> >> > > > > > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > > > > > over >> >> > > > > > > > > > > > > > > > > > > > using memory at the container >> level, >> >> > > which >> >> > > > is >> >> > > > > > not >> >> > > > > > > > > good. >> >> > > > > > > > > > > My >> >> > > > > > > > > > > > > > point >> >> > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and >> "JVM >> >> > > > > Overhead" >> >> > > > > > > are >> >> > > > > > > > > not >> >> > > > > > > > > > > easy >> >> > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > config. >> >> > > > > > > > > > > > > > > > > > > For >> >> > > > > > > > > > > > > > > > > > > > alternative 2, users might >> configure >> >> > them >> >> > > > > > higher >> >> > > > > > > > than >> >> > > > > > > > > > > what >> >> > > > > > > > > > > > > > > actually >> >> > > > > > > > > > > > > > > > > > > needed, >> >> > > > > > > > > > > > > > > > > > > > just to avoid getting a direct >> OOM. >> >> For >> >> > > > > > > alternative >> >> > > > > > > > > 3, >> >> > > > > > > > > > > > users >> >> > > > > > > > > > > > > do >> >> > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > get >> >> > > > > > > > > > > > > > > > > > > > direct OOM, so they may not >> config >> >> the >> >> > > two >> >> > > > > > > options >> >> > > > > > > > > > > > > aggressively >> >> > > > > > > > > > > > > > > > high. >> >> > > > > > > > > > > > > > > > > > But >> >> > > > > > > > > > > > > > > > > > > > the consequences are risks of >> >> overall >> >> > > > > container >> >> > > > > > > > > memory >> >> > > > > > > > > > > > usage >> >> > > > > > > > > > > > > > > > exceeds >> >> > > > > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > > > > budget. >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM >> Till >> >> > > > > Rohrmann < >> >> > > > > > > > > > > > > > > > trohrm...@apache.org> >> >> > > > > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > Thanks for proposing this FLIP >> >> > Xintong. >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > All in all I think it already >> >> looks >> >> > > quite >> >> > > > > > good. >> >> > > > > > > > > > > > Concerning >> >> > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > first >> >> > > > > > > > > > > > > > > > > > > open >> >> > > > > > > > > > > > > > > > > > > > > question about allocating >> memory >> >> > > > segments, >> >> > > > > I >> >> > > > > > > was >> >> > > > > > > > > > > > wondering >> >> > > > > > > > > > > > > > > > whether >> >> > > > > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > > > > strictly necessary to do in the >> >> > context >> >> > > > of >> >> > > > > > this >> >> > > > > > > > > FLIP >> >> > > > > > > > > > or >> >> > > > > > > > > > > > > > whether >> >> > > > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > could >> >> > > > > > > > > > > > > > > > > > > > > be done as a follow up? Without >> >> > knowing >> >> > > > all >> >> > > > > > > > > details, >> >> > > > > > > > > > I >> >> > > > > > > > > > > > > would >> >> > > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > > > > concerned >> >> > > > > > > > > > > > > > > > > > > > > that we would widen the scope >> of >> >> this >> >> > > > FLIP >> >> > > > > > too >> >> > > > > > > > much >> >> > > > > > > > > > > > because >> >> > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > would >> >> > > > > > > > > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > > > > > > > to touch all the existing call >> >> sites >> >> > of >> >> > > > the >> >> > > > > > > > > > > MemoryManager >> >> > > > > > > > > > > > > > where >> >> > > > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > > > > allocate >> >> > > > > > > > > > > > > > > > > > > > > memory segments (this should >> >> mainly >> >> > be >> >> > > > > batch >> >> > > > > > > > > > > operators). >> >> > > > > > > > > > > > > The >> >> > > > > > > > > > > > > > > > > addition >> >> > > > > > > > > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > > > > > > > the memory reservation call to >> the >> >> > > > > > > MemoryManager >> >> > > > > > > > > > should >> >> > > > > > > > > > > > not >> >> > > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > > > affected >> >> > > > > > > > > > > > > > > > > > > > by >> >> > > > > > > > > > > > > > > > > > > > > this and I would hope that >> this is >> >> > the >> >> > > > only >> >> > > > > > > point >> >> > > > > > > > > of >> >> > > > > > > > > > > > > > > interaction >> >> > > > > > > > > > > > > > > > a >> >> > > > > > > > > > > > > > > > > > > > > streaming job would have with >> the >> >> > > > > > > MemoryManager. >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > Concerning the second open >> >> question >> >> > > about >> >> > > > > > > setting >> >> > > > > > > > > or >> >> > > > > > > > > > > not >> >> > > > > > > > > > > > > > > setting >> >> > > > > > > > > > > > > > > > a >> >> > > > > > > > > > > > > > > > > > max >> >> > > > > > > > > > > > > > > > > > > > > direct memory limit, I would >> also >> >> be >> >> > > > > > interested >> >> > > > > > > > why >> >> > > > > > > > > > > Yang >> >> > > > > > > > > > > > > Wang >> >> > > > > > > > > > > > > > > > > thinks >> >> > > > > > > > > > > > > > > > > > > > > leaving it open would be best. >> My >> >> > > concern >> >> > > > > > about >> >> > > > > > > > > this >> >> > > > > > > > > > > > would >> >> > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > > > > would >> >> > > > > > > > > > > > > > > > > > > > > be in a similar situation as we >> >> are >> >> > now >> >> > > > > with >> >> > > > > > > the >> >> > > > > > > > > > > > > > > > > RocksDBStateBackend. >> >> > > > > > > > > > > > > > > > > > > If >> >> > > > > > > > > > > > > > > > > > > > > the different memory pools are >> not >> >> > > > clearly >> >> > > > > > > > > separated >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > can >> >> > > > > > > > > > > > > > > > spill >> >> > > > > > > > > > > > > > > > > > over >> >> > > > > > > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > > > a different pool, then it is >> quite >> >> > hard >> >> > > > to >> >> > > > > > > > > understand >> >> > > > > > > > > > > > what >> >> > > > > > > > > > > > > > > > exactly >> >> > > > > > > > > > > > > > > > > > > > causes a >> >> > > > > > > > > > > > > > > > > > > > > process to get killed for using >> >> too >> >> > > much >> >> > > > > > > memory. >> >> > > > > > > > > This >> >> > > > > > > > > > > > could >> >> > > > > > > > > > > > > > > then >> >> > > > > > > > > > > > > > > > > > easily >> >> > > > > > > > > > > > > > > > > > > > > lead to a similar situation >> what >> >> we >> >> > > have >> >> > > > > with >> >> > > > > > > the >> >> > > > > > > > > > > > > > cutoff-ratio. >> >> > > > > > > > > > > > > > > > So >> >> > > > > > > > > > > > > > > > > > why >> >> > > > > > > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > > > > > setting a sane default value >> for >> >> max >> >> > > > direct >> >> > > > > > > > memory >> >> > > > > > > > > > and >> >> > > > > > > > > > > > > giving >> >> > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > > user >> >> > > > > > > > > > > > > > > > > > > an >> >> > > > > > > > > > > > > > > > > > > > > option to increase it if he >> runs >> >> into >> >> > > an >> >> > > > > OOM. >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > @Xintong, how would >> alternative 2 >> >> > lead >> >> > > to >> >> > > > > > lower >> >> > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > utilization >> >> > > > > > > > > > > > > > > > > > than >> >> > > > > > > > > > > > > > > > > > > > > alternative 3 where we set the >> >> direct >> >> > > > > memory >> >> > > > > > > to a >> >> > > > > > > > > > > higher >> >> > > > > > > > > > > > > > value? >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > Cheers, >> >> > > > > > > > > > > > > > > > > > > > > Till >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM >> >> > Xintong >> >> > > > > Song < >> >> > > > > > > > > > > > > > > > tonysong...@gmail.com >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback, >> Yang. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Regarding your comments: >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > *Native and Direct Memory* >> >> > > > > > > > > > > > > > > > > > > > > > I think setting a very large >> max >> >> > > direct >> >> > > > > > > memory >> >> > > > > > > > > size >> >> > > > > > > > > > > > > > > definitely >> >> > > > > > > > > > > > > > > > > has >> >> > > > > > > > > > > > > > > > > > > some >> >> > > > > > > > > > > > > > > > > > > > > > good sides. E.g., we do not >> >> worry >> >> > > about >> >> > > > > > > direct >> >> > > > > > > > > OOM, >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > don't >> >> > > > > > > > > > > > > > > > > > even >> >> > > > > > > > > > > > > > > > > > > > > need >> >> > > > > > > > > > > > > > > > > > > > > > to allocate managed / network >> >> > memory >> >> > > > with >> >> > > > > > > > > > > > > > Unsafe.allocate() . >> >> > > > > > > > > > > > > > > > > > > > > > However, there are also some >> >> down >> >> > > sides >> >> > > > > of >> >> > > > > > > > doing >> >> > > > > > > > > > > this. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > - One thing I can think >> of is >> >> > that >> >> > > > if >> >> > > > > a >> >> > > > > > > task >> >> > > > > > > > > > > > executor >> >> > > > > > > > > > > > > > > > > container >> >> > > > > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > > > > > killed due to overusing >> >> memory, >> >> > it >> >> > > > > could >> >> > > > > > > be >> >> > > > > > > > > hard >> >> > > > > > > > > > > for >> >> > > > > > > > > > > > > use >> >> > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > know >> >> > > > > > > > > > > > > > > > > > > > which >> >> > > > > > > > > > > > > > > > > > > > > > part >> >> > > > > > > > > > > > > > > > > > > > > > of the memory is overused. >> >> > > > > > > > > > > > > > > > > > > > > > - Another down side is >> that >> >> the >> >> > > JVM >> >> > > > > > never >> >> > > > > > > > > > trigger >> >> > > > > > > > > > > GC >> >> > > > > > > > > > > > > due >> >> > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > reaching >> >> > > > > > > > > > > > > > > > > > > > > max >> >> > > > > > > > > > > > > > > > > > > > > > direct memory limit, >> because >> >> the >> >> > > > limit >> >> > > > > > is >> >> > > > > > > > too >> >> > > > > > > > > > high >> >> > > > > > > > > > > > to >> >> > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > > > reached. >> >> > > > > > > > > > > > > > > > > > > > That >> >> > > > > > > > > > > > > > > > > > > > > > means we kind of relay on >> >> heap >> >> > > > memory >> >> > > > > to >> >> > > > > > > > > trigger >> >> > > > > > > > > > > GC >> >> > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > release >> >> > > > > > > > > > > > > > > > > > > > direct >> >> > > > > > > > > > > > > > > > > > > > > > memory. That could be a >> >> problem >> >> > in >> >> > > > > cases >> >> > > > > > > > where >> >> > > > > > > > > > we >> >> > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > more >> >> > > > > > > > > > > > > > > > > > direct >> >> > > > > > > > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > > > > usage but not enough heap >> >> > activity >> >> > > > to >> >> > > > > > > > trigger >> >> > > > > > > > > > the >> >> > > > > > > > > > > > GC. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Maybe you can share your >> reasons >> >> > for >> >> > > > > > > preferring >> >> > > > > > > > > > > > setting a >> >> > > > > > > > > > > > > > > very >> >> > > > > > > > > > > > > > > > > > large >> >> > > > > > > > > > > > > > > > > > > > > value, >> >> > > > > > > > > > > > > > > > > > > > > > if there are anything else I >> >> > > > overlooked. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > *Memory Calculation* >> >> > > > > > > > > > > > > > > > > > > > > > If there is any conflict >> between >> >> > > > multiple >> >> > > > > > > > > > > configuration >> >> > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > user >> >> > > > > > > > > > > > > > > > > > > > > > explicitly specified, I >> think we >> >> > > should >> >> > > > > > throw >> >> > > > > > > > an >> >> > > > > > > > > > > error. >> >> > > > > > > > > > > > > > > > > > > > > > I think doing checking on the >> >> > client >> >> > > > side >> >> > > > > > is >> >> > > > > > > a >> >> > > > > > > > > good >> >> > > > > > > > > > > > idea, >> >> > > > > > > > > > > > > > so >> >> > > > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > > > > > > Yarn / >> >> > > > > > > > > > > > > > > > > > > > > > K8s we can discover the >> problem >> >> > > before >> >> > > > > > > > submitting >> >> > > > > > > > > > the >> >> > > > > > > > > > > > > Flink >> >> > > > > > > > > > > > > > > > > > cluster, >> >> > > > > > > > > > > > > > > > > > > > > which >> >> > > > > > > > > > > > > > > > > > > > > > is always a good thing. >> >> > > > > > > > > > > > > > > > > > > > > > But we can not only rely on >> the >> >> > > client >> >> > > > > side >> >> > > > > > > > > > checking, >> >> > > > > > > > > > > > > > because >> >> > > > > > > > > > > > > > > > for >> >> > > > > > > > > > > > > > > > > > > > > > standalone cluster >> TaskManagers >> >> on >> >> > > > > > different >> >> > > > > > > > > > machines >> >> > > > > > > > > > > > may >> >> > > > > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > > > > > > different >> >> > > > > > > > > > > > > > > > > > > > > > configurations and the client >> >> does >> >> > > see >> >> > > > > > that. >> >> > > > > > > > > > > > > > > > > > > > > > What do you think? >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 >> PM >> >> Yang >> >> > > > Wang >> >> > > > > < >> >> > > > > > > > > > > > > > > > danrtsey...@gmail.com> >> >> > > > > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > Hi xintong, >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > Thanks for your detailed >> >> > proposal. >> >> > > > > After >> >> > > > > > > all >> >> > > > > > > > > the >> >> > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > configuration >> >> > > > > > > > > > > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > > > > > > > > > introduced, it will be more >> >> > > powerful >> >> > > > to >> >> > > > > > > > control >> >> > > > > > > > > > the >> >> > > > > > > > > > > > > flink >> >> > > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > > > usage. I >> >> > > > > > > > > > > > > > > > > > > > > > > just have few questions >> about >> >> it. >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > - Native and Direct >> Memory >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > We do not differentiate >> user >> >> > direct >> >> > > > > > memory >> >> > > > > > > > and >> >> > > > > > > > > > > native >> >> > > > > > > > > > > > > > > memory. >> >> > > > > > > > > > > > > > > > > > They >> >> > > > > > > > > > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > > > > > > > > all >> >> > > > > > > > > > > > > > > > > > > > > > > included in task off-heap >> >> memory. >> >> > > > > Right? >> >> > > > > > > So i >> >> > > > > > > > > > don’t >> >> > > > > > > > > > > > > think >> >> > > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > > could >> >> > > > > > > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > > > > > > set >> >> > > > > > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize >> >> > > > properly. I >> >> > > > > > > > prefer >> >> > > > > > > > > > > > leaving >> >> > > > > > > > > > > > > > it a >> >> > > > > > > > > > > > > > > > > very >> >> > > > > > > > > > > > > > > > > > > > large >> >> > > > > > > > > > > > > > > > > > > > > > > value. >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > - Memory Calculation >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > If the sum of and >> fine-grained >> >> > > > > > > memory(network >> >> > > > > > > > > > > memory, >> >> > > > > > > > > > > > > > > managed >> >> > > > > > > > > > > > > > > > > > > memory, >> >> > > > > > > > > > > > > > > > > > > > > > etc.) >> >> > > > > > > > > > > > > > > > > > > > > > > is larger than total >> process >> >> > > memory, >> >> > > > > how >> >> > > > > > do >> >> > > > > > > > we >> >> > > > > > > > > > deal >> >> > > > > > > > > > > > > with >> >> > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > > situation? >> >> > > > > > > > > > > > > > > > > > > > > > Do >> >> > > > > > > > > > > > > > > > > > > > > > > we need to check the memory >> >> > > > > configuration >> >> > > > > > > in >> >> > > > > > > > > > > client? >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > Xintong Song < >> >> > > tonysong...@gmail.com> >> >> > > > > > > > > > 于2019年8月7日周三 >> >> > > > > > > > > > > > > > > 下午10:14写道: >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > We would like to start a >> >> > > discussion >> >> > > > > > > thread >> >> > > > > > > > on >> >> > > > > > > > > > > > > "FLIP-49: >> >> > > > > > > > > > > > > > > > > Unified >> >> > > > > > > > > > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > > > > > > > > > > Configuration for >> >> > > > TaskExecutors"[1], >> >> > > > > > > where >> >> > > > > > > > we >> >> > > > > > > > > > > > > describe >> >> > > > > > > > > > > > > > > how >> >> > > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > > improve >> >> > > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory >> >> > > configurations. >> >> > > > > The >> >> > > > > > > > FLIP >> >> > > > > > > > > > > > document >> >> > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > mostly >> >> > > > > > > > > > > > > > > > > > > > based >> >> > > > > > > > > > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > > > > > > > > an >> >> > > > > > > > > > > > > > > > > > > > > > > > early design "Memory >> >> Management >> >> > > and >> >> > > > > > > > > > Configuration >> >> > > > > > > > > > > > > > > > > Reloaded"[2] >> >> > > > > > > > > > > > > > > > > > by >> >> > > > > > > > > > > > > > > > > > > > > > > Stephan, >> >> > > > > > > > > > > > > > > > > > > > > > > > with updates from >> follow-up >> >> > > > > discussions >> >> > > > > > > > both >> >> > > > > > > > > > > online >> >> > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > > offline. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > This FLIP addresses >> several >> >> > > > > > shortcomings >> >> > > > > > > of >> >> > > > > > > > > > > current >> >> > > > > > > > > > > > > > > (Flink >> >> > > > > > > > > > > > > > > > > 1.9) >> >> > > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory >> >> > > configuration. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > - Different >> configuration >> >> > for >> >> > > > > > > Streaming >> >> > > > > > > > > and >> >> > > > > > > > > > > > Batch. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Complex and >> difficult >> >> > > > > > configuration >> >> > > > > > > of >> >> > > > > > > > > > > RocksDB >> >> > > > > > > > > > > > > in >> >> > > > > > > > > > > > > > > > > > Streaming. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Complicated, >> uncertain >> >> and >> >> > > > hard >> >> > > > > to >> >> > > > > > > > > > > understand. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Key changes to solve the >> >> > problems >> >> > > > can >> >> > > > > > be >> >> > > > > > > > > > > summarized >> >> > > > > > > > > > > > > as >> >> > > > > > > > > > > > > > > > > follows. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > - Extend memory >> manager >> >> to >> >> > > also >> >> > > > > > > account >> >> > > > > > > > > for >> >> > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > usage >> >> > > > > > > > > > > > > > > > > by >> >> > > > > > > > > > > > > > > > > > > > state >> >> > > > > > > > > > > > > > > > > > > > > > > > backends. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Modify how >> TaskExecutor >> >> > > memory >> >> > > > > is >> >> > > > > > > > > > > partitioned >> >> > > > > > > > > > > > > > > > accounted >> >> > > > > > > > > > > > > > > > > > > > > individual >> >> > > > > > > > > > > > > > > > > > > > > > > > memory reservations >> and >> >> > pools. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Simplify memory >> >> > > configuration >> >> > > > > > > options >> >> > > > > > > > > and >> >> > > > > > > > > > > > > > > calculations >> >> > > > > > > > > > > > > > > > > > > logics. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Please find more details >> in >> >> the >> >> > > > FLIP >> >> > > > > > wiki >> >> > > > > > > > > > > document >> >> > > > > > > > > > > > > [1]. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > (Please note that the >> early >> >> > > design >> >> > > > > doc >> >> > > > > > > [2] >> >> > > > > > > > is >> >> > > > > > > > > > out >> >> > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > sync, >> >> > > > > > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > > it >> >> > > > > > > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > > > > > > > appreciated to have the >> >> > > discussion >> >> > > > in >> >> > > > > > > this >> >> > > > > > > > > > > mailing >> >> > > > > > > > > > > > > list >> >> > > > > > > > > > > > > > > > > > thread.) >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Looking forward to your >> >> > > feedbacks. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Xintong Songhttps://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutorshttps://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 2:16 PM Xintong Song >> < >> >> > > > > > > > > > tonysong...@gmail.com> >> >> > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Thanks for sharing your opinion Till. >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > I'm also in favor of alternative 2. I was >> >> > wondering >> >> > > > > > whether >> >> > > > > > > > we >> >> > > > > > > > > > can >> >> > > > > > > > > > > > > avoid >> >> > > > > > > > > > > > > > > using Unsafe.allocate() for off-heap >> managed >> >> > memory >> >> > > > and >> >> > > > > > > > network >> >> > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > with >> >> > > > > > > > > > > > > > > alternative 3. But after giving it a second >> >> > > thought, >> >> > > > I >> >> > > > > > > think >> >> > > > > > > > > even >> >> > > > > > > > > > > for >> >> > > > > > > > > > > > > > > alternative 3 using direct memory for >> off-heap >> >> > > > managed >> >> > > > > > > memory >> >> > > > > > > > > > could >> >> > > > > > > > > > > > > cause >> >> > > > > > > > > > > > > > > problems. >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Hi Yang, >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Regarding your concern, I think what >> proposed >> >> in >> >> > > this >> >> > > > > > FLIP >> >> > > > > > > it >> >> > > > > > > > > to >> >> > > > > > > > > > > have >> >> > > > > > > > > > > > > > both >> >> > > > > > > > > > > > > > > off-heap managed memory and network memory >> >> > > allocated >> >> > > > > > > through >> >> > > > > > > > > > > > > > > Unsafe.allocate(), which means they are >> >> > practically >> >> > > > > > native >> >> > > > > > > > > memory >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > limited by JVM max direct memory. The only >> >> parts >> >> > of >> >> > > > > > memory >> >> > > > > > > > > > limited >> >> > > > > > > > > > > by >> >> > > > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > max direct memory are task off-heap memory >> and >> >> > JVM >> >> > > > > > > overhead, >> >> > > > > > > > > > which >> >> > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > exactly alternative 2 suggests to set the >> JVM >> >> max >> >> > > > > direct >> >> > > > > > > > memory >> >> > > > > > > > > > to. >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 1:48 PM Till >> Rohrmann >> >> < >> >> > > > > > > > > > > trohrm...@apache.org> >> >> > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > Thanks for the clarification Xintong. I >> >> > > understand >> >> > > > > the >> >> > > > > > > two >> >> > > > > > > > > > > > > alternatives >> >> > > > > > > > > > > > > > > > now. >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > I would be in favour of option 2 because >> it >> >> > makes >> >> > > > > > things >> >> > > > > > > > > > > explicit. >> >> > > > > > > > > > > > If >> >> > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > don't limit the direct memory, I fear >> that >> >> we >> >> > > might >> >> > > > > end >> >> > > > > > > up >> >> > > > > > > > > in a >> >> > > > > > > > > > > > > similar >> >> > > > > > > > > > > > > > > > situation as we are currently in: The >> user >> >> > might >> >> > > > see >> >> > > > > > that >> >> > > > > > > > her >> >> > > > > > > > > > > > process >> >> > > > > > > > > > > > > > > gets >> >> > > > > > > > > > > > > > > > killed by the OS and does not know why >> this >> >> is >> >> > > the >> >> > > > > > case. >> >> > > > > > > > > > > > > Consequently, >> >> > > > > > > > > > > > > > > she >> >> > > > > > > > > > > > > > > > tries to decrease the process memory size >> >> > > (similar >> >> > > > to >> >> > > > > > > > > > increasing >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > cutoff >> >> > > > > > > > > > > > > > > > ratio) in order to accommodate for the >> extra >> >> > > direct >> >> > > > > > > memory. >> >> > > > > > > > > > Even >> >> > > > > > > > > > > > > worse, >> >> > > > > > > > > > > > > > > she >> >> > > > > > > > > > > > > > > > tries to decrease memory budgets which >> are >> >> not >> >> > > > fully >> >> > > > > > used >> >> > > > > > > > and >> >> > > > > > > > > > > hence >> >> > > > > > > > > > > > > > won't >> >> > > > > > > > > > > > > > > > change the overall memory consumption. >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > Cheers, >> >> > > > > > > > > > > > > > > > Till >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 11:01 AM Xintong >> >> Song < >> >> > > > > > > > > > > > tonysong...@gmail.com >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Let me explain this with a concrete >> >> example >> >> > > Till. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Let's say we have the following >> scenario. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Total Process Memory: 1GB >> >> > > > > > > > > > > > > > > > > JVM Direct Memory (Task Off-Heap >> Memory + >> >> JVM >> >> > > > > > > Overhead): >> >> > > > > > > > > > 200MB >> >> > > > > > > > > > > > > > > > > Other Memory (JVM Heap Memory, JVM >> >> Metaspace, >> >> > > > > > Off-Heap >> >> > > > > > > > > > Managed >> >> > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > Network Memory): 800MB >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > For alternative 2, we set >> >> > > -XX:MaxDirectMemorySize >> >> > > > > to >> >> > > > > > > > 200MB. >> >> > > > > > > > > > > > > > > > > For alternative 3, we set >> >> > > -XX:MaxDirectMemorySize >> >> > > > > to >> >> > > > > > a >> >> > > > > > > > very >> >> > > > > > > > > > > large >> >> > > > > > > > > > > > > > > value, >> >> > > > > > > > > > > > > > > > > let's say 1TB. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > If the actual direct memory usage of >> Task >> >> > > > Off-Heap >> >> > > > > > > Memory >> >> > > > > > > > > and >> >> > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > Overhead >> >> > > > > > > > > > > > > > > > > do not exceed 200MB, then alternative 2 >> >> and >> >> > > > > > > alternative 3 >> >> > > > > > > > > > > should >> >> > > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > same utility. Setting larger >> >> > > > > -XX:MaxDirectMemorySize >> >> > > > > > > will >> >> > > > > > > > > not >> >> > > > > > > > > > > > > reduce >> >> > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > sizes of the other memory pools. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > If the actual direct memory usage of >> Task >> >> > > > Off-Heap >> >> > > > > > > Memory >> >> > > > > > > > > and >> >> > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > > Overhead potentially exceed 200MB, then >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > - Alternative 2 suffers from >> frequent >> >> OOM. >> >> > > To >> >> > > > > > avoid >> >> > > > > > > > > that, >> >> > > > > > > > > > > the >> >> > > > > > > > > > > > > only >> >> > > > > > > > > > > > > > > > thing >> >> > > > > > > > > > > > > > > > > user can do is to modify the >> >> configuration >> >> > > and >> >> > > > > > > > increase >> >> > > > > > > > > > JVM >> >> > > > > > > > > > > > > Direct >> >> > > > > > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > > > (Task Off-Heap Memory + JVM >> Overhead). >> >> > Let's >> >> > > > say >> >> > > > > > > that >> >> > > > > > > > > user >> >> > > > > > > > > > > > > > increases >> >> > > > > > > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > > Direct Memory to 250MB, this will >> >> reduce >> >> > the >> >> > > > > total >> >> > > > > > > > size >> >> > > > > > > > > of >> >> > > > > > > > > > > > other >> >> > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > pools to 750MB, given the total >> process >> >> > > memory >> >> > > > > > > remains >> >> > > > > > > > > > 1GB. >> >> > > > > > > > > > > > > > > > > - For alternative 3, there is no >> >> chance of >> >> > > > > direct >> >> > > > > > > OOM. >> >> > > > > > > > > > There >> >> > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > > chances >> >> > > > > > > > > > > > > > > > > of exceeding the total process >> memory >> >> > limit, >> >> > > > but >> >> > > > > > > given >> >> > > > > > > > > > that >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > process >> >> > > > > > > > > > > > > > > > > may >> >> > > > > > > > > > > > > > > > > not use up all the reserved native >> >> memory >> >> > > > > > (Off-Heap >> >> > > > > > > > > > Managed >> >> > > > > > > > > > > > > > Memory, >> >> > > > > > > > > > > > > > > > > Network >> >> > > > > > > > > > > > > > > > > Memory, JVM Metaspace), if the >> actual >> >> > direct >> >> > > > > > memory >> >> > > > > > > > > usage >> >> > > > > > > > > > is >> >> > > > > > > > > > > > > > > slightly >> >> > > > > > > > > > > > > > > > > above >> >> > > > > > > > > > > > > > > > > yet very close to 200MB, user >> probably >> >> do >> >> > > not >> >> > > > > need >> >> > > > > > > to >> >> > > > > > > > > > change >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > configurations. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Therefore, I think from the user's >> >> > > perspective, a >> >> > > > > > > > feasible >> >> > > > > > > > > > > > > > > configuration >> >> > > > > > > > > > > > > > > > > for alternative 2 may lead to lower >> >> resource >> >> > > > > > > utilization >> >> > > > > > > > > > > compared >> >> > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > alternative 3. >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > On Fri, Aug 16, 2019 at 10:28 AM Till >> >> > Rohrmann >> >> > > < >> >> > > > > > > > > > > > > trohrm...@apache.org >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > I guess you have to help me >> understand >> >> the >> >> > > > > > difference >> >> > > > > > > > > > between >> >> > > > > > > > > > > > > > > > > alternative 2 >> >> > > > > > > > > > > > > > > > > > and 3 wrt to memory under utilization >> >> > > Xintong. >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > - Alternative 2: set >> >> XX:MaxDirectMemorySize >> >> > > to >> >> > > > > Task >> >> > > > > > > > > > Off-Heap >> >> > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > JVM >> >> > > > > > > > > > > > > > > > > > Overhead. Then there is the risk that >> >> this >> >> > > size >> >> > > > > is >> >> > > > > > > too >> >> > > > > > > > > low >> >> > > > > > > > > > > > > > resulting >> >> > > > > > > > > > > > > > > > in a >> >> > > > > > > > > > > > > > > > > > lot of garbage collection and >> >> potentially >> >> > an >> >> > > > OOM. >> >> > > > > > > > > > > > > > > > > > - Alternative 3: set >> >> XX:MaxDirectMemorySize >> >> > > to >> >> > > > > > > > something >> >> > > > > > > > > > > larger >> >> > > > > > > > > > > > > > than >> >> > > > > > > > > > > > > > > > > > alternative 2. This would of course >> >> reduce >> >> > > the >> >> > > > > > sizes >> >> > > > > > > of >> >> > > > > > > > > the >> >> > > > > > > > > > > > other >> >> > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > types. >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > How would alternative 2 now result >> in an >> >> > > under >> >> > > > > > > > > utilization >> >> > > > > > > > > > of >> >> > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > compared to alternative 3? If >> >> alternative 3 >> >> > > > > > strictly >> >> > > > > > > > > sets a >> >> > > > > > > > > > > > > higher >> >> > > > > > > > > > > > > > > max >> >> > > > > > > > > > > > > > > > > > direct memory size and we use only >> >> little, >> >> > > > then I >> >> > > > > > > would >> >> > > > > > > > > > > expect >> >> > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > > alternative 3 results in memory under >> >> > > > > utilization. >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > Cheers, >> >> > > > > > > > > > > > > > > > > > Till >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 4:19 PM Yang >> >> Wang < >> >> > > > > > > > > > > > danrtsey...@gmail.com >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Hi xintong,till >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Native and Direct Memory >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > My point is setting a very large >> max >> >> > direct >> >> > > > > > memory >> >> > > > > > > > size >> >> > > > > > > > > > > when >> >> > > > > > > > > > > > we >> >> > > > > > > > > > > > > > do >> >> > > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > > > differentiate direct and native >> >> memory. >> >> > If >> >> > > > the >> >> > > > > > > direct >> >> > > > > > > > > > > > > > > > memory,including >> >> > > > > > > > > > > > > > > > > > user >> >> > > > > > > > > > > > > > > > > > > direct memory and framework direct >> >> > > > memory,could >> >> > > > > > be >> >> > > > > > > > > > > calculated >> >> > > > > > > > > > > > > > > > > > > correctly,then >> >> > > > > > > > > > > > > > > > > > > i am in favor of setting direct >> memory >> >> > with >> >> > > > > fixed >> >> > > > > > > > > value. >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Memory Calculation >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > I agree with xintong. For Yarn and >> >> k8s,we >> >> > > > need >> >> > > > > to >> >> > > > > > > > check >> >> > > > > > > > > > the >> >> > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > configurations in client to avoid >> >> > > submitting >> >> > > > > > > > > successfully >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > > > failing >> >> > > > > > > > > > > > > > > > > in >> >> > > > > > > > > > > > > > > > > > > the flink master. >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Best, >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Yang >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > Xintong Song < >> tonysong...@gmail.com >> >> > > > > >于2019年8月13日 >> >> > > > > > > > > > 周二22:07写道: >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Thanks for replying, Till. >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > About MemorySegment, I think you >> are >> >> > > right >> >> > > > > that >> >> > > > > > > we >> >> > > > > > > > > > should >> >> > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > include >> >> > > > > > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > issue in the scope of this FLIP. >> >> This >> >> > > FLIP >> >> > > > > > should >> >> > > > > > > > > > > > concentrate >> >> > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > how >> >> > > > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > > configure memory pools for >> >> > TaskExecutors, >> >> > > > > with >> >> > > > > > > > > minimum >> >> > > > > > > > > > > > > > > involvement >> >> > > > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > > > how >> >> > > > > > > > > > > > > > > > > > > > memory consumers use it. >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > About direct memory, I think >> >> > alternative >> >> > > 3 >> >> > > > > may >> >> > > > > > > not >> >> > > > > > > > > > having >> >> > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > same >> >> > > > > > > > > > > > > > > > > over >> >> > > > > > > > > > > > > > > > > > > > reservation issue that >> alternative 2 >> >> > > does, >> >> > > > > but >> >> > > > > > at >> >> > > > > > > > the >> >> > > > > > > > > > > cost >> >> > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > risk >> >> > > > > > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > > > > > over >> >> > > > > > > > > > > > > > > > > > > > using memory at the container >> level, >> >> > > which >> >> > > > is >> >> > > > > > not >> >> > > > > > > > > good. >> >> > > > > > > > > > > My >> >> > > > > > > > > > > > > > point >> >> > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > > > > both "Task Off-Heap Memory" and >> "JVM >> >> > > > > Overhead" >> >> > > > > > > are >> >> > > > > > > > > not >> >> > > > > > > > > > > easy >> >> > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > config. >> >> > > > > > > > > > > > > > > > > > > For >> >> > > > > > > > > > > > > > > > > > > > alternative 2, users might >> configure >> >> > them >> >> > > > > > higher >> >> > > > > > > > than >> >> > > > > > > > > > > what >> >> > > > > > > > > > > > > > > actually >> >> > > > > > > > > > > > > > > > > > > needed, >> >> > > > > > > > > > > > > > > > > > > > just to avoid getting a direct >> OOM. >> >> For >> >> > > > > > > alternative >> >> > > > > > > > > 3, >> >> > > > > > > > > > > > users >> >> > > > > > > > > > > > > do >> >> > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > get >> >> > > > > > > > > > > > > > > > > > > > direct OOM, so they may not >> config >> >> the >> >> > > two >> >> > > > > > > options >> >> > > > > > > > > > > > > aggressively >> >> > > > > > > > > > > > > > > > high. >> >> > > > > > > > > > > > > > > > > > But >> >> > > > > > > > > > > > > > > > > > > > the consequences are risks of >> >> overall >> >> > > > > container >> >> > > > > > > > > memory >> >> > > > > > > > > > > > usage >> >> > > > > > > > > > > > > > > > exceeds >> >> > > > > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > > > > budget. >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 9:39 AM >> Till >> >> > > > > Rohrmann < >> >> > > > > > > > > > > > > > > > trohrm...@apache.org> >> >> > > > > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > Thanks for proposing this FLIP >> >> > Xintong. >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > All in all I think it already >> >> looks >> >> > > quite >> >> > > > > > good. >> >> > > > > > > > > > > > Concerning >> >> > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > first >> >> > > > > > > > > > > > > > > > > > > open >> >> > > > > > > > > > > > > > > > > > > > > question about allocating >> memory >> >> > > > segments, >> >> > > > > I >> >> > > > > > > was >> >> > > > > > > > > > > > wondering >> >> > > > > > > > > > > > > > > > whether >> >> > > > > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > > > > strictly necessary to do in the >> >> > context >> >> > > > of >> >> > > > > > this >> >> > > > > > > > > FLIP >> >> > > > > > > > > > or >> >> > > > > > > > > > > > > > whether >> >> > > > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > could >> >> > > > > > > > > > > > > > > > > > > > > be done as a follow up? Without >> >> > knowing >> >> > > > all >> >> > > > > > > > > details, >> >> > > > > > > > > > I >> >> > > > > > > > > > > > > would >> >> > > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > > > > concerned >> >> > > > > > > > > > > > > > > > > > > > > that we would widen the scope >> of >> >> this >> >> > > > FLIP >> >> > > > > > too >> >> > > > > > > > much >> >> > > > > > > > > > > > because >> >> > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > would >> >> > > > > > > > > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > > > > > > > to touch all the existing call >> >> sites >> >> > of >> >> > > > the >> >> > > > > > > > > > > MemoryManager >> >> > > > > > > > > > > > > > where >> >> > > > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > > > > allocate >> >> > > > > > > > > > > > > > > > > > > > > memory segments (this should >> >> mainly >> >> > be >> >> > > > > batch >> >> > > > > > > > > > > operators). >> >> > > > > > > > > > > > > The >> >> > > > > > > > > > > > > > > > > addition >> >> > > > > > > > > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > > > > > > > the memory reservation call to >> the >> >> > > > > > > MemoryManager >> >> > > > > > > > > > should >> >> > > > > > > > > > > > not >> >> > > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > > > affected >> >> > > > > > > > > > > > > > > > > > > > by >> >> > > > > > > > > > > > > > > > > > > > > this and I would hope that >> this is >> >> > the >> >> > > > only >> >> > > > > > > point >> >> > > > > > > > > of >> >> > > > > > > > > > > > > > > interaction >> >> > > > > > > > > > > > > > > > a >> >> > > > > > > > > > > > > > > > > > > > > streaming job would have with >> the >> >> > > > > > > MemoryManager. >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > Concerning the second open >> >> question >> >> > > about >> >> > > > > > > setting >> >> > > > > > > > > or >> >> > > > > > > > > > > not >> >> > > > > > > > > > > > > > > setting >> >> > > > > > > > > > > > > > > > a >> >> > > > > > > > > > > > > > > > > > max >> >> > > > > > > > > > > > > > > > > > > > > direct memory limit, I would >> also >> >> be >> >> > > > > > interested >> >> > > > > > > > why >> >> > > > > > > > > > > Yang >> >> > > > > > > > > > > > > Wang >> >> > > > > > > > > > > > > > > > > thinks >> >> > > > > > > > > > > > > > > > > > > > > leaving it open would be best. >> My >> >> > > concern >> >> > > > > > about >> >> > > > > > > > > this >> >> > > > > > > > > > > > would >> >> > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > > > > would >> >> > > > > > > > > > > > > > > > > > > > > be in a similar situation as we >> >> are >> >> > now >> >> > > > > with >> >> > > > > > > the >> >> > > > > > > > > > > > > > > > > RocksDBStateBackend. >> >> > > > > > > > > > > > > > > > > > > If >> >> > > > > > > > > > > > > > > > > > > > > the different memory pools are >> not >> >> > > > clearly >> >> > > > > > > > > separated >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > can >> >> > > > > > > > > > > > > > > > spill >> >> > > > > > > > > > > > > > > > > > over >> >> > > > > > > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > > > a different pool, then it is >> quite >> >> > hard >> >> > > > to >> >> > > > > > > > > understand >> >> > > > > > > > > > > > what >> >> > > > > > > > > > > > > > > > exactly >> >> > > > > > > > > > > > > > > > > > > > causes a >> >> > > > > > > > > > > > > > > > > > > > > process to get killed for using >> >> too >> >> > > much >> >> > > > > > > memory. >> >> > > > > > > > > This >> >> > > > > > > > > > > > could >> >> > > > > > > > > > > > > > > then >> >> > > > > > > > > > > > > > > > > > easily >> >> > > > > > > > > > > > > > > > > > > > > lead to a similar situation >> what >> >> we >> >> > > have >> >> > > > > with >> >> > > > > > > the >> >> > > > > > > > > > > > > > cutoff-ratio. >> >> > > > > > > > > > > > > > > > So >> >> > > > > > > > > > > > > > > > > > why >> >> > > > > > > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > > > > > setting a sane default value >> for >> >> max >> >> > > > direct >> >> > > > > > > > memory >> >> > > > > > > > > > and >> >> > > > > > > > > > > > > giving >> >> > > > > > > > > > > > > > > the >> >> > > > > > > > > > > > > > > > > > user >> >> > > > > > > > > > > > > > > > > > > an >> >> > > > > > > > > > > > > > > > > > > > > option to increase it if he >> runs >> >> into >> >> > > an >> >> > > > > OOM. >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > @Xintong, how would >> alternative 2 >> >> > lead >> >> > > to >> >> > > > > > lower >> >> > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > utilization >> >> > > > > > > > > > > > > > > > > > than >> >> > > > > > > > > > > > > > > > > > > > > alternative 3 where we set the >> >> direct >> >> > > > > memory >> >> > > > > > > to a >> >> > > > > > > > > > > higher >> >> > > > > > > > > > > > > > value? >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > Cheers, >> >> > > > > > > > > > > > > > > > > > > > > Till >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 9:12 AM >> >> > Xintong >> >> > > > > Song < >> >> > > > > > > > > > > > > > > > tonysong...@gmail.com >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback, >> Yang. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Regarding your comments: >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > *Native and Direct Memory* >> >> > > > > > > > > > > > > > > > > > > > > > I think setting a very large >> max >> >> > > direct >> >> > > > > > > memory >> >> > > > > > > > > size >> >> > > > > > > > > > > > > > > definitely >> >> > > > > > > > > > > > > > > > > has >> >> > > > > > > > > > > > > > > > > > > some >> >> > > > > > > > > > > > > > > > > > > > > > good sides. E.g., we do not >> >> worry >> >> > > about >> >> > > > > > > direct >> >> > > > > > > > > OOM, >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > don't >> >> > > > > > > > > > > > > > > > > > even >> >> > > > > > > > > > > > > > > > > > > > > need >> >> > > > > > > > > > > > > > > > > > > > > > to allocate managed / network >> >> > memory >> >> > > > with >> >> > > > > > > > > > > > > > Unsafe.allocate() . >> >> > > > > > > > > > > > > > > > > > > > > > However, there are also some >> >> down >> >> > > sides >> >> > > > > of >> >> > > > > > > > doing >> >> > > > > > > > > > > this. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > - One thing I can think >> of is >> >> > that >> >> > > > if >> >> > > > > a >> >> > > > > > > task >> >> > > > > > > > > > > > executor >> >> > > > > > > > > > > > > > > > > container >> >> > > > > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > > > > > killed due to overusing >> >> memory, >> >> > it >> >> > > > > could >> >> > > > > > > be >> >> > > > > > > > > hard >> >> > > > > > > > > > > for >> >> > > > > > > > > > > > > use >> >> > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > know >> >> > > > > > > > > > > > > > > > > > > > which >> >> > > > > > > > > > > > > > > > > > > > > > part >> >> > > > > > > > > > > > > > > > > > > > > > of the memory is overused. >> >> > > > > > > > > > > > > > > > > > > > > > - Another down side is >> that >> >> the >> >> > > JVM >> >> > > > > > never >> >> > > > > > > > > > trigger >> >> > > > > > > > > > > GC >> >> > > > > > > > > > > > > due >> >> > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > reaching >> >> > > > > > > > > > > > > > > > > > > > > max >> >> > > > > > > > > > > > > > > > > > > > > > direct memory limit, >> because >> >> the >> >> > > > limit >> >> > > > > > is >> >> > > > > > > > too >> >> > > > > > > > > > high >> >> > > > > > > > > > > > to >> >> > > > > > > > > > > > > be >> >> > > > > > > > > > > > > > > > > > reached. >> >> > > > > > > > > > > > > > > > > > > > That >> >> > > > > > > > > > > > > > > > > > > > > > means we kind of relay on >> >> heap >> >> > > > memory >> >> > > > > to >> >> > > > > > > > > trigger >> >> > > > > > > > > > > GC >> >> > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > release >> >> > > > > > > > > > > > > > > > > > > > direct >> >> > > > > > > > > > > > > > > > > > > > > > memory. That could be a >> >> problem >> >> > in >> >> > > > > cases >> >> > > > > > > > where >> >> > > > > > > > > > we >> >> > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > more >> >> > > > > > > > > > > > > > > > > > direct >> >> > > > > > > > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > > > > usage but not enough heap >> >> > activity >> >> > > > to >> >> > > > > > > > trigger >> >> > > > > > > > > > the >> >> > > > > > > > > > > > GC. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Maybe you can share your >> reasons >> >> > for >> >> > > > > > > preferring >> >> > > > > > > > > > > > setting a >> >> > > > > > > > > > > > > > > very >> >> > > > > > > > > > > > > > > > > > large >> >> > > > > > > > > > > > > > > > > > > > > value, >> >> > > > > > > > > > > > > > > > > > > > > > if there are anything else I >> >> > > > overlooked. >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > *Memory Calculation* >> >> > > > > > > > > > > > > > > > > > > > > > If there is any conflict >> between >> >> > > > multiple >> >> > > > > > > > > > > configuration >> >> > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > user >> >> > > > > > > > > > > > > > > > > > > > > > explicitly specified, I >> think we >> >> > > should >> >> > > > > > throw >> >> > > > > > > > an >> >> > > > > > > > > > > error. >> >> > > > > > > > > > > > > > > > > > > > > > I think doing checking on the >> >> > client >> >> > > > side >> >> > > > > > is >> >> > > > > > > a >> >> > > > > > > > > good >> >> > > > > > > > > > > > idea, >> >> > > > > > > > > > > > > > so >> >> > > > > > > > > > > > > > > > that >> >> > > > > > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > > > > > > Yarn / >> >> > > > > > > > > > > > > > > > > > > > > > K8s we can discover the >> problem >> >> > > before >> >> > > > > > > > submitting >> >> > > > > > > > > > the >> >> > > > > > > > > > > > > Flink >> >> > > > > > > > > > > > > > > > > > cluster, >> >> > > > > > > > > > > > > > > > > > > > > which >> >> > > > > > > > > > > > > > > > > > > > > > is always a good thing. >> >> > > > > > > > > > > > > > > > > > > > > > But we can not only rely on >> the >> >> > > client >> >> > > > > side >> >> > > > > > > > > > checking, >> >> > > > > > > > > > > > > > because >> >> > > > > > > > > > > > > > > > for >> >> > > > > > > > > > > > > > > > > > > > > > standalone cluster >> TaskManagers >> >> on >> >> > > > > > different >> >> > > > > > > > > > machines >> >> > > > > > > > > > > > may >> >> > > > > > > > > > > > > > > have >> >> > > > > > > > > > > > > > > > > > > > different >> >> > > > > > > > > > > > > > > > > > > > > > configurations and the client >> >> does >> >> > > see >> >> > > > > > that. >> >> > > > > > > > > > > > > > > > > > > > > > What do you think? >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 5:09 >> PM >> >> Yang >> >> > > > Wang >> >> > > > > < >> >> > > > > > > > > > > > > > > > danrtsey...@gmail.com> >> >> > > > > > > > > > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > Hi xintong, >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > Thanks for your detailed >> >> > proposal. >> >> > > > > After >> >> > > > > > > all >> >> > > > > > > > > the >> >> > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > configuration >> >> > > > > > > > > > > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > > > > > > > > > introduced, it will be more >> >> > > powerful >> >> > > > to >> >> > > > > > > > control >> >> > > > > > > > > > the >> >> > > > > > > > > > > > > flink >> >> > > > > > > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > > > > > > > usage. I >> >> > > > > > > > > > > > > > > > > > > > > > > just have few questions >> about >> >> it. >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > - Native and Direct >> Memory >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > We do not differentiate >> user >> >> > direct >> >> > > > > > memory >> >> > > > > > > > and >> >> > > > > > > > > > > native >> >> > > > > > > > > > > > > > > memory. >> >> > > > > > > > > > > > > > > > > > They >> >> > > > > > > > > > > > > > > > > > > > are >> >> > > > > > > > > > > > > > > > > > > > > > all >> >> > > > > > > > > > > > > > > > > > > > > > > included in task off-heap >> >> memory. >> >> > > > > Right? >> >> > > > > > > So i >> >> > > > > > > > > > don’t >> >> > > > > > > > > > > > > think >> >> > > > > > > > > > > > > > > we >> >> > > > > > > > > > > > > > > > > > could >> >> > > > > > > > > > > > > > > > > > > > not >> >> > > > > > > > > > > > > > > > > > > > > > set >> >> > > > > > > > > > > > > > > > > > > > > > > the -XX:MaxDirectMemorySize >> >> > > > properly. I >> >> > > > > > > > prefer >> >> > > > > > > > > > > > leaving >> >> > > > > > > > > > > > > > it a >> >> > > > > > > > > > > > > > > > > very >> >> > > > > > > > > > > > > > > > > > > > large >> >> > > > > > > > > > > > > > > > > > > > > > > value. >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > - Memory Calculation >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > If the sum of and >> fine-grained >> >> > > > > > > memory(network >> >> > > > > > > > > > > memory, >> >> > > > > > > > > > > > > > > managed >> >> > > > > > > > > > > > > > > > > > > memory, >> >> > > > > > > > > > > > > > > > > > > > > > etc.) >> >> > > > > > > > > > > > > > > > > > > > > > > is larger than total >> process >> >> > > memory, >> >> > > > > how >> >> > > > > > do >> >> > > > > > > > we >> >> > > > > > > > > > deal >> >> > > > > > > > > > > > > with >> >> > > > > > > > > > > > > > > this >> >> > > > > > > > > > > > > > > > > > > > > situation? >> >> > > > > > > > > > > > > > > > > > > > > > Do >> >> > > > > > > > > > > > > > > > > > > > > > > we need to check the memory >> >> > > > > configuration >> >> > > > > > > in >> >> > > > > > > > > > > client? >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > Xintong Song < >> >> > > tonysong...@gmail.com> >> >> > > > > > > > > > 于2019年8月7日周三 >> >> > > > > > > > > > > > > > > 下午10:14写道: >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > We would like to start a >> >> > > discussion >> >> > > > > > > thread >> >> > > > > > > > on >> >> > > > > > > > > > > > > "FLIP-49: >> >> > > > > > > > > > > > > > > > > Unified >> >> > > > > > > > > > > > > > > > > > > > > Memory >> >> > > > > > > > > > > > > > > > > > > > > > > > Configuration for >> >> > > > TaskExecutors"[1], >> >> > > > > > > where >> >> > > > > > > > we >> >> > > > > > > > > > > > > describe >> >> > > > > > > > > > > > > > > how >> >> > > > > > > > > > > > > > > > to >> >> > > > > > > > > > > > > > > > > > > > improve >> >> > > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory >> >> > > configurations. >> >> > > > > The >> >> > > > > > > > FLIP >> >> > > > > > > > > > > > document >> >> > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > mostly >> >> > > > > > > > > > > > > > > > > > > > based >> >> > > > > > > > > > > > > > > > > > > > > > on >> >> > > > > > > > > > > > > > > > > > > > > > > an >> >> > > > > > > > > > > > > > > > > > > > > > > > early design "Memory >> >> Management >> >> > > and >> >> > > > > > > > > > Configuration >> >> > > > > > > > > > > > > > > > > Reloaded"[2] >> >> > > > > > > > > > > > > > > > > > by >> >> > > > > > > > > > > > > > > > > > > > > > > Stephan, >> >> > > > > > > > > > > > > > > > > > > > > > > > with updates from >> follow-up >> >> > > > > discussions >> >> > > > > > > > both >> >> > > > > > > > > > > online >> >> > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > > offline. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > This FLIP addresses >> several >> >> > > > > > shortcomings >> >> > > > > > > of >> >> > > > > > > > > > > current >> >> > > > > > > > > > > > > > > (Flink >> >> > > > > > > > > > > > > > > > > 1.9) >> >> > > > > > > > > > > > > > > > > > > > > > > > TaskExecutor memory >> >> > > configuration. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > - Different >> configuration >> >> > for >> >> > > > > > > Streaming >> >> > > > > > > > > and >> >> > > > > > > > > > > > Batch. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Complex and >> difficult >> >> > > > > > configuration >> >> > > > > > > of >> >> > > > > > > > > > > RocksDB >> >> > > > > > > > > > > > > in >> >> > > > > > > > > > > > > > > > > > Streaming. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Complicated, >> uncertain >> >> and >> >> > > > hard >> >> > > > > to >> >> > > > > > > > > > > understand. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Key changes to solve the >> >> > problems >> >> > > > can >> >> > > > > > be >> >> > > > > > > > > > > summarized >> >> > > > > > > > > > > > > as >> >> > > > > > > > > > > > > > > > > follows. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > - Extend memory >> manager >> >> to >> >> > > also >> >> > > > > > > account >> >> > > > > > > > > for >> >> > > > > > > > > > > > memory >> >> > > > > > > > > > > > > > > usage >> >> > > > > > > > > > > > > > > > > by >> >> > > > > > > > > > > > > > > > > > > > state >> >> > > > > > > > > > > > > > > > > > > > > > > > backends. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Modify how >> TaskExecutor >> >> > > memory >> >> > > > > is >> >> > > > > > > > > > > partitioned >> >> > > > > > > > > > > > > > > > accounted >> >> > > > > > > > > > > > > > > > > > > > > individual >> >> > > > > > > > > > > > > > > > > > > > > > > > memory reservations >> and >> >> > pools. >> >> > > > > > > > > > > > > > > > > > > > > > > > - Simplify memory >> >> > > configuration >> >> > > > > > > options >> >> > > > > > > > > and >> >> > > > > > > > > > > > > > > calculations >> >> > > > > > > > > > > > > > > > > > > logics. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Please find more details >> in >> >> the >> >> > > > FLIP >> >> > > > > > wiki >> >> > > > > > > > > > > document >> >> > > > > > > > > > > > > [1]. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > (Please note that the >> early >> >> > > design >> >> > > > > doc >> >> > > > > > > [2] >> >> > > > > > > > is >> >> > > > > > > > > > out >> >> > > > > > > > > > > > of >> >> > > > > > > > > > > > > > > sync, >> >> > > > > > > > > > > > > > > > > and >> >> > > > > > > > > > > > > > > > > > it >> >> > > > > > > > > > > > > > > > > > > > is >> >> > > > > > > > > > > > > > > > > > > > > > > > appreciated to have the >> >> > > discussion >> >> > > > in >> >> > > > > > > this >> >> > > > > > > > > > > mailing >> >> > > > > > > > > > > > > list >> >> > > > > > > > > > > > > > > > > > thread.) >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Looking forward to your >> >> > > feedbacks. >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Thank you~ >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > Xintong Song >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > [1] >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutorshttps://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit?usp=sharing